linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c
@ 2022-01-31 22:08 Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 01/17] RDMA/rxe: Move rxe_mcast_add/delete " Bob Pearson
                   ` (17 more replies)
  0 siblings, 18 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

This patch series separates the mc_grp and mc_elem object pools from
rxe_pools.c and moves their code to rxe_mcast.c. This makes sense because
these two pools are different from the other pools as the only ones that
do not share objects with rdma-core and that use key's instead of
indices to enable looking up objects. This change will enable a significant
simplification of the normal object pools.

Note: the independent implementation takes references to multicast groups
and also to pointers stored in the red-black trees holding the keys. The
reference handling code is moved to be adjacent to the code that manages
the pointers.

This patch series applies cleanly to current for-next.
commit e783362eb54cd99b2cac8b3a9aeac942e6f6ac07 (tag: v5.17-rc1,
		origin/wip/jgg-for-rc, origin/wip/jgg-for-next,
		origin/wip/for-testing, origin/for-rc,
		origin/for-next, origin/HEAD, for-next)

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
v10
  Corrected issues reported by Jason Gunthorpe
  Isolated patches 01-17 separate from the remaining patches.
  They will be submitted later
v9
  Corrected issues reported by Jason Gunthorpe,
  Converted locking in rxe_mcast.c and rxe_pool.c to use RCU
  Split up the patches into smaller changes
v8
  Fixed an additional race in 3/8 which was not handled correctly.
v7
  Corrected issues reported by Jason Gunthorpe
Link: https://lore.kernel.org/linux-rdma/20211207190947.GH6385@nvidia.com/
Link: https://lore.kernel.org/linux-rdma/20211207191857.GI6385@nvidia.com/
Link: https://lore.kernel.org/linux-rdma/20211207192824.GJ6385@nvidia.com/
v6
  Fixed a kzalloc flags bug.
  Fixed comment bug reported by 'Kernel Test Robot'.
  Changed type of rxe_pool.c in __rxe_fini().
v5
  Removed patches already accepted into for-next and addressed comments
  from Jason Gunthorpe.
v4
  Restructured patch series to change to xarray earlier which
  greatly simplified the changes.
  Rebased to current for-next
v3
  Changed rxe_alloc to use GFP_KERNEL
  Addressed other comments by Jason Gunthorp
  Merged the previous 06/10 and 07/10 patches into one since they overlapped
  Added some minor cleanups as 10/10
v2
  Rebased to current for-next.
  Added 4 additional patches

Bob Pearson (17):
  RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c
  RDMA/rxe: Move rxe_mcast_attach/detach to rxe_mcast.c
  RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem
  RDMA/rxe: Enforce IBA o10-2.2.3
  RDMA/rxe: Remove rxe_drop_all_macst_groups
  RDMA/rxe: Remove qp->grp_lock and qp->grp_list
  RDMA/rxe: Use kzmalloc/kfree for mca
  RDMA/rxe: Rename grp to mcg and mce to mca
  RDMA/rxe: Introduce RXECB(skb)
  RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases
  RDMA/rxe: Replace mcg locks by rxe->mcg_lock
  RDMA/rxe: Replace pool key by rxe->mcg_tree
  RDMA/rxe: Remove key'ed object support
  RDMA/rxe: Remove mcg from rxe pools
  RDMA/rxe: Add code to cleanup mcast memory
  RDMA/rxe: Add comments to rxe_mcast.c
  RDMA/rxe: Finish cleanup of rxe_mcast.c

 drivers/infiniband/sw/rxe/rxe.c       |  21 +-
 drivers/infiniband/sw/rxe/rxe_hdr.h   |   3 +
 drivers/infiniband/sw/rxe/rxe_loc.h   |  28 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c | 684 +++++++++++++++++++-------
 drivers/infiniband/sw/rxe/rxe_net.c   |  18 -
 drivers/infiniband/sw/rxe/rxe_pool.c  | 137 ------
 drivers/infiniband/sw/rxe/rxe_pool.h  |  40 --
 drivers/infiniband/sw/rxe/rxe_qp.c    |  19 +-
 drivers/infiniband/sw/rxe/rxe_recv.c  | 104 ++--
 drivers/infiniband/sw/rxe/rxe_verbs.c |  31 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h |  25 +-
 11 files changed, 614 insertions(+), 496 deletions(-)
 rewrite drivers/infiniband/sw/rxe/rxe_mcast.c (86%)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 01/17] RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 02/17] RDMA/rxe: Move rxe_mcast_attach/detach " Bob Pearson
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Move rxe_mcast_add and rxe_mcast_delete from rxe_net.c to rxe_mcast.c,
make static and remove declarations from rxe_loc.h.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_loc.h   |  2 --
 drivers/infiniband/sw/rxe/rxe_mcast.c | 18 ++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_net.c   | 18 ------------------
 3 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index b1e174afb1d4..bcec33c3c3b7 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -106,8 +106,6 @@ int rxe_prepare(struct rxe_pkt_info *pkt, struct sk_buff *skb);
 int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 		    struct sk_buff *skb);
 const char *rxe_parent_name(struct rxe_dev *rxe, unsigned int port_num);
-int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid);
-int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid);
 
 /* rxe_qp.c */
 int rxe_qp_chk_init(struct rxe_dev *rxe, struct ib_qp_init_attr *init);
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index bd1ac88b8700..e5689c161984 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -7,6 +7,24 @@
 #include "rxe.h"
 #include "rxe_loc.h"
 
+static int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
+{
+	unsigned char ll_addr[ETH_ALEN];
+
+	ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
+
+	return dev_mc_add(rxe->ndev, ll_addr);
+}
+
+static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
+{
+	unsigned char ll_addr[ETH_ALEN];
+
+	ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
+
+	return dev_mc_del(rxe->ndev, ll_addr);
+}
+
 /* caller should hold mc_grp_pool->pool_lock */
 static struct rxe_mc_grp *create_grp(struct rxe_dev *rxe,
 				     struct rxe_pool *pool,
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index be72bdbfb4ba..a8cfa7160478 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -20,24 +20,6 @@
 
 static struct rxe_recv_sockets recv_sockets;
 
-int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
-{
-	unsigned char ll_addr[ETH_ALEN];
-
-	ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
-
-	return dev_mc_add(rxe->ndev, ll_addr);
-}
-
-int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
-{
-	unsigned char ll_addr[ETH_ALEN];
-
-	ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
-
-	return dev_mc_del(rxe->ndev, ll_addr);
-}
-
 static struct dst_entry *rxe_find_route4(struct net_device *ndev,
 				  struct in_addr *saddr,
 				  struct in_addr *daddr)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 02/17] RDMA/rxe: Move rxe_mcast_attach/detach to rxe_mcast.c
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 01/17] RDMA/rxe: Move rxe_mcast_add/delete " Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 03/17] RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem Bob Pearson
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Move rxe_mcast_attach and rxe_mcast_detach from rxe_verbs.c to rxe_mcast.c,
Make non-static and add declarations to rxe_loc.h. Make the subroutines
in rxe_mcast.c referenced by these routines static and remove their
declarations from rxe_loc.h.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_loc.h   | 12 ++-------
 drivers/infiniband/sw/rxe/rxe_mcast.c | 36 +++++++++++++++++++++++----
 drivers/infiniband/sw/rxe/rxe_verbs.c | 26 -------------------
 3 files changed, 33 insertions(+), 41 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index bcec33c3c3b7..dc606241f0d6 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -40,18 +40,10 @@ void rxe_cq_disable(struct rxe_cq *cq);
 void rxe_cq_cleanup(struct rxe_pool_elem *arg);
 
 /* rxe_mcast.c */
-int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
-		      struct rxe_mc_grp **grp_p);
-
-int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-			   struct rxe_mc_grp *grp);
-
-int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-			    union ib_gid *mgid);
-
 void rxe_drop_all_mcast_groups(struct rxe_qp *qp);
-
 void rxe_mc_cleanup(struct rxe_pool_elem *arg);
+int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
+int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 
 /* rxe_mmap.c */
 struct rxe_mmap_info {
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index e5689c161984..f86e32f4e77f 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -52,8 +52,8 @@ static struct rxe_mc_grp *create_grp(struct rxe_dev *rxe,
 	return grp;
 }
 
-int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
-		      struct rxe_mc_grp **grp_p)
+static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
+			     struct rxe_mc_grp **grp_p)
 {
 	int err;
 	struct rxe_mc_grp *grp;
@@ -81,7 +81,7 @@ int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 	return 0;
 }
 
-int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
+static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			   struct rxe_mc_grp *grp)
 {
 	int err;
@@ -125,8 +125,8 @@ int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	return err;
 }
 
-int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-			    union ib_gid *mgid)
+static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
+				   union ib_gid *mgid)
 {
 	struct rxe_mc_grp *grp;
 	struct rxe_mc_elem *elem, *tmp;
@@ -194,3 +194,29 @@ void rxe_mc_cleanup(struct rxe_pool_elem *elem)
 	rxe_drop_key(grp);
 	rxe_mcast_delete(rxe, &grp->mgid);
 }
+
+int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
+{
+	int err;
+	struct rxe_dev *rxe = to_rdev(ibqp->device);
+	struct rxe_qp *qp = to_rqp(ibqp);
+	struct rxe_mc_grp *grp;
+
+	/* takes a ref on grp if successful */
+	err = rxe_mcast_get_grp(rxe, mgid, &grp);
+	if (err)
+		return err;
+
+	err = rxe_mcast_add_grp_elem(rxe, qp, grp);
+
+	rxe_drop_ref(grp);
+	return err;
+}
+
+int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
+{
+	struct rxe_dev *rxe = to_rdev(ibqp->device);
+	struct rxe_qp *qp = to_rqp(ibqp);
+
+	return rxe_mcast_drop_grp_elem(rxe, qp, mgid);
+}
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 915ad6664321..f7682541f9af 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -999,32 +999,6 @@ static int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
 	return n;
 }
 
-static int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
-{
-	int err;
-	struct rxe_dev *rxe = to_rdev(ibqp->device);
-	struct rxe_qp *qp = to_rqp(ibqp);
-	struct rxe_mc_grp *grp;
-
-	/* takes a ref on grp if successful */
-	err = rxe_mcast_get_grp(rxe, mgid, &grp);
-	if (err)
-		return err;
-
-	err = rxe_mcast_add_grp_elem(rxe, qp, grp);
-
-	rxe_drop_ref(grp);
-	return err;
-}
-
-static int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
-{
-	struct rxe_dev *rxe = to_rdev(ibqp->device);
-	struct rxe_qp *qp = to_rqp(ibqp);
-
-	return rxe_mcast_drop_grp_elem(rxe, qp, mgid);
-}
-
 static ssize_t parent_show(struct device *device,
 			   struct device_attribute *attr, char *buf)
 {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 03/17] RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 01/17] RDMA/rxe: Move rxe_mcast_add/delete " Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 02/17] RDMA/rxe: Move rxe_mcast_attach/detach " Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 04/17] RDMA/rxe: Enforce IBA o10-2.2.3 Bob Pearson
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Rename rxe_mc_grp to rxe_mcg. Rename rxe_mc_elem to rxe_mca.
These can be read 'multicast group' and 'multicast attachment'.
'elem' collided with the use of elem in rxe pools and was a little
confusing.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 26 +++++++++++++-------------
 drivers/infiniband/sw/rxe/rxe_pool.c  | 10 +++++-----
 drivers/infiniband/sw/rxe/rxe_recv.c  |  4 ++--
 drivers/infiniband/sw/rxe/rxe_verbs.h |  6 +++---
 4 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index f86e32f4e77f..949784198d80 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -26,12 +26,12 @@ static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
 }
 
 /* caller should hold mc_grp_pool->pool_lock */
-static struct rxe_mc_grp *create_grp(struct rxe_dev *rxe,
+static struct rxe_mcg *create_grp(struct rxe_dev *rxe,
 				     struct rxe_pool *pool,
 				     union ib_gid *mgid)
 {
 	int err;
-	struct rxe_mc_grp *grp;
+	struct rxe_mcg *grp;
 
 	grp = rxe_alloc_locked(&rxe->mc_grp_pool);
 	if (!grp)
@@ -53,10 +53,10 @@ static struct rxe_mc_grp *create_grp(struct rxe_dev *rxe,
 }
 
 static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
-			     struct rxe_mc_grp **grp_p)
+			     struct rxe_mcg **grp_p)
 {
 	int err;
-	struct rxe_mc_grp *grp;
+	struct rxe_mcg *grp;
 	struct rxe_pool *pool = &rxe->mc_grp_pool;
 
 	if (rxe->attr.max_mcast_qp_attach == 0)
@@ -82,10 +82,10 @@ static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 }
 
 static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-			   struct rxe_mc_grp *grp)
+			   struct rxe_mcg *grp)
 {
 	int err;
-	struct rxe_mc_elem *elem;
+	struct rxe_mca *elem;
 
 	/* check to see of the qp is already a member of the group */
 	spin_lock_bh(&qp->grp_lock);
@@ -128,8 +128,8 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 				   union ib_gid *mgid)
 {
-	struct rxe_mc_grp *grp;
-	struct rxe_mc_elem *elem, *tmp;
+	struct rxe_mcg *grp;
+	struct rxe_mca *elem, *tmp;
 
 	grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
 	if (!grp)
@@ -162,8 +162,8 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 void rxe_drop_all_mcast_groups(struct rxe_qp *qp)
 {
-	struct rxe_mc_grp *grp;
-	struct rxe_mc_elem *elem;
+	struct rxe_mcg *grp;
+	struct rxe_mca *elem;
 
 	while (1) {
 		spin_lock_bh(&qp->grp_lock);
@@ -171,7 +171,7 @@ void rxe_drop_all_mcast_groups(struct rxe_qp *qp)
 			spin_unlock_bh(&qp->grp_lock);
 			break;
 		}
-		elem = list_first_entry(&qp->grp_list, struct rxe_mc_elem,
+		elem = list_first_entry(&qp->grp_list, struct rxe_mca,
 					grp_list);
 		list_del(&elem->grp_list);
 		spin_unlock_bh(&qp->grp_lock);
@@ -188,7 +188,7 @@ void rxe_drop_all_mcast_groups(struct rxe_qp *qp)
 
 void rxe_mc_cleanup(struct rxe_pool_elem *elem)
 {
-	struct rxe_mc_grp *grp = container_of(elem, typeof(*grp), elem);
+	struct rxe_mcg *grp = container_of(elem, typeof(*grp), elem);
 	struct rxe_dev *rxe = grp->rxe;
 
 	rxe_drop_key(grp);
@@ -200,7 +200,7 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	int err;
 	struct rxe_dev *rxe = to_rdev(ibqp->device);
 	struct rxe_qp *qp = to_rqp(ibqp);
-	struct rxe_mc_grp *grp;
+	struct rxe_mcg *grp;
 
 	/* takes a ref on grp if successful */
 	err = rxe_mcast_get_grp(rxe, mgid, &grp);
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 4cb003885e00..63c594173565 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -83,17 +83,17 @@ static const struct rxe_type_info {
 	},
 	[RXE_TYPE_MC_GRP] = {
 		.name		= "rxe-mc_grp",
-		.size		= sizeof(struct rxe_mc_grp),
-		.elem_offset	= offsetof(struct rxe_mc_grp, elem),
+		.size		= sizeof(struct rxe_mcg),
+		.elem_offset	= offsetof(struct rxe_mcg, elem),
 		.cleanup	= rxe_mc_cleanup,
 		.flags		= RXE_POOL_KEY,
-		.key_offset	= offsetof(struct rxe_mc_grp, mgid),
+		.key_offset	= offsetof(struct rxe_mcg, mgid),
 		.key_size	= sizeof(union ib_gid),
 	},
 	[RXE_TYPE_MC_ELEM] = {
 		.name		= "rxe-mc_elem",
-		.size		= sizeof(struct rxe_mc_elem),
-		.elem_offset	= offsetof(struct rxe_mc_elem, elem),
+		.size		= sizeof(struct rxe_mca),
+		.elem_offset	= offsetof(struct rxe_mca, elem),
 	},
 };
 
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 6a6cc1fa90e4..7ff6b53555f4 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -233,8 +233,8 @@ static inline void rxe_rcv_pkt(struct rxe_pkt_info *pkt, struct sk_buff *skb)
 static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 {
 	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
-	struct rxe_mc_grp *mcg;
-	struct rxe_mc_elem *mce;
+	struct rxe_mcg *mcg;
+	struct rxe_mca *mce;
 	struct rxe_qp *qp;
 	union ib_gid dgid;
 	int err;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index e48969e8d4c8..388b7dc23dd7 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -353,7 +353,7 @@ struct rxe_mw {
 	u64			length;
 };
 
-struct rxe_mc_grp {
+struct rxe_mcg {
 	struct rxe_pool_elem	elem;
 	spinlock_t		mcg_lock; /* guard group */
 	struct rxe_dev		*rxe;
@@ -364,12 +364,12 @@ struct rxe_mc_grp {
 	u16			pkey;
 };
 
-struct rxe_mc_elem {
+struct rxe_mca {
 	struct rxe_pool_elem	elem;
 	struct list_head	qp_list;
 	struct list_head	grp_list;
 	struct rxe_qp		*qp;
-	struct rxe_mc_grp	*grp;
+	struct rxe_mcg		*grp;
 };
 
 struct rxe_port {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 04/17] RDMA/rxe: Enforce IBA o10-2.2.3
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (2 preceding siblings ...)
  2022-01-31 22:08 ` [PATCH for-next v10 03/17] RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 05/17] RDMA/rxe: Remove rxe_drop_all_macst_groups Bob Pearson
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Add code to check if a QP is attached to one or more multicast groups
when destroy_qp is called and return an error if so.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_loc.h   |  9 +--------
 drivers/infiniband/sw/rxe/rxe_mcast.c |  2 ++
 drivers/infiniband/sw/rxe/rxe_qp.c    | 14 ++++++++++++++
 drivers/infiniband/sw/rxe/rxe_verbs.c |  5 +++++
 drivers/infiniband/sw/rxe/rxe_verbs.h |  1 +
 5 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index dc606241f0d6..052beaaacf43 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -101,26 +101,19 @@ const char *rxe_parent_name(struct rxe_dev *rxe, unsigned int port_num);
 
 /* rxe_qp.c */
 int rxe_qp_chk_init(struct rxe_dev *rxe, struct ib_qp_init_attr *init);
-
 int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd,
 		     struct ib_qp_init_attr *init,
 		     struct rxe_create_qp_resp __user *uresp,
 		     struct ib_pd *ibpd, struct ib_udata *udata);
-
 int rxe_qp_to_init(struct rxe_qp *qp, struct ib_qp_init_attr *init);
-
 int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp *qp,
 		    struct ib_qp_attr *attr, int mask);
-
 int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr,
 		     int mask, struct ib_udata *udata);
-
 int rxe_qp_to_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask);
-
 void rxe_qp_error(struct rxe_qp *qp);
-
+int rxe_qp_chk_destroy(struct rxe_qp *qp);
 void rxe_qp_destroy(struct rxe_qp *qp);
-
 void rxe_qp_cleanup(struct rxe_pool_elem *elem);
 
 static inline int qp_num(struct rxe_qp *qp)
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 949784198d80..34e3c52f0b72 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -114,6 +114,7 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	grp->num_qp++;
 	elem->qp = qp;
 	elem->grp = grp;
+	atomic_inc(&qp->mcg_num);
 
 	list_add(&elem->qp_list, &grp->qp_list);
 	list_add(&elem->grp_list, &qp->grp_list);
@@ -143,6 +144,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			list_del(&elem->qp_list);
 			list_del(&elem->grp_list);
 			grp->num_qp--;
+			atomic_dec(&qp->mcg_num);
 
 			spin_unlock_bh(&grp->mcg_lock);
 			spin_unlock_bh(&qp->grp_lock);
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 5018b9387694..99284337f547 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -770,6 +770,20 @@ int rxe_qp_to_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask)
 	return 0;
 }
 
+int rxe_qp_chk_destroy(struct rxe_qp *qp)
+{
+	/* See IBA o10-2.2.3
+	 * An attempt to destroy a QP while attached to a mcast group
+	 * will fail immediately.
+	 */
+	if (atomic_read(&qp->mcg_num)) {
+		pr_debug_once("Attempt to destroy QP while attached to multicast group\n");
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
 /* called by the destroy qp verb */
 void rxe_qp_destroy(struct rxe_qp *qp)
 {
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index f7682541f9af..9f0aef4b649d 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -493,6 +493,11 @@ static int rxe_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 {
 	struct rxe_qp *qp = to_rqp(ibqp);
+	int ret;
+
+	ret = rxe_qp_chk_destroy(qp);
+	if (ret)
+		return ret;
 
 	rxe_qp_destroy(qp);
 	rxe_drop_index(qp);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 388b7dc23dd7..4910d0782e33 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -235,6 +235,7 @@ struct rxe_qp {
 	/* list of mcast groups qp has joined (for cleanup) */
 	struct list_head	grp_list;
 	spinlock_t		grp_lock; /* guard grp_list */
+	atomic_t		mcg_num;
 
 	struct sk_buff_head	req_pkts;
 	struct sk_buff_head	resp_pkts;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 05/17] RDMA/rxe: Remove rxe_drop_all_macst_groups
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (3 preceding siblings ...)
  2022-01-31 22:08 ` [PATCH for-next v10 04/17] RDMA/rxe: Enforce IBA o10-2.2.3 Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 06/17] RDMA/rxe: Remove qp->grp_lock and qp->grp_list Bob Pearson
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

With o10-2.2.3 enforced rxe_drop_all_mcast_groups is completely
unnecessary. Remove it and references to it.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_loc.h   |  1 -
 drivers/infiniband/sw/rxe/rxe_mcast.c | 26 --------------------------
 drivers/infiniband/sw/rxe/rxe_qp.c    |  2 --
 3 files changed, 29 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 052beaaacf43..af40e3c212fb 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -40,7 +40,6 @@ void rxe_cq_disable(struct rxe_cq *cq);
 void rxe_cq_cleanup(struct rxe_pool_elem *arg);
 
 /* rxe_mcast.c */
-void rxe_drop_all_mcast_groups(struct rxe_qp *qp);
 void rxe_mc_cleanup(struct rxe_pool_elem *arg);
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 34e3c52f0b72..39a41daa7a6b 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -162,32 +162,6 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	return -EINVAL;
 }
 
-void rxe_drop_all_mcast_groups(struct rxe_qp *qp)
-{
-	struct rxe_mcg *grp;
-	struct rxe_mca *elem;
-
-	while (1) {
-		spin_lock_bh(&qp->grp_lock);
-		if (list_empty(&qp->grp_list)) {
-			spin_unlock_bh(&qp->grp_lock);
-			break;
-		}
-		elem = list_first_entry(&qp->grp_list, struct rxe_mca,
-					grp_list);
-		list_del(&elem->grp_list);
-		spin_unlock_bh(&qp->grp_lock);
-
-		grp = elem->grp;
-		spin_lock_bh(&grp->mcg_lock);
-		list_del(&elem->qp_list);
-		grp->num_qp--;
-		spin_unlock_bh(&grp->mcg_lock);
-		rxe_drop_ref(grp);
-		rxe_drop_ref(elem);
-	}
-}
-
 void rxe_mc_cleanup(struct rxe_pool_elem *elem)
 {
 	struct rxe_mcg *grp = container_of(elem, typeof(*grp), elem);
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 99284337f547..a21d704dc376 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -812,8 +812,6 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
 {
 	struct rxe_qp *qp = container_of(work, typeof(*qp), cleanup_work.work);
 
-	rxe_drop_all_mcast_groups(qp);
-
 	if (qp->sq.queue)
 		rxe_queue_cleanup(qp->sq.queue);
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 06/17] RDMA/rxe: Remove qp->grp_lock and qp->grp_list
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (4 preceding siblings ...)
  2022-01-31 22:08 ` [PATCH for-next v10 05/17] RDMA/rxe: Remove rxe_drop_all_macst_groups Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 07/17] RDMA/rxe: Use kzmalloc/kfree for mca Bob Pearson
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Since it is no longer required to cleanup attachments to multicast
groups when a QP is destroyed qp->grp_lock and qp->grp_list are
no longer needed and are removed.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 8 --------
 drivers/infiniband/sw/rxe/rxe_qp.c    | 3 ---
 drivers/infiniband/sw/rxe/rxe_verbs.h | 5 -----
 3 files changed, 16 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 39a41daa7a6b..9336295c4ee2 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -88,7 +88,6 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	struct rxe_mca *elem;
 
 	/* check to see of the qp is already a member of the group */
-	spin_lock_bh(&qp->grp_lock);
 	spin_lock_bh(&grp->mcg_lock);
 	list_for_each_entry(elem, &grp->qp_list, qp_list) {
 		if (elem->qp == qp) {
@@ -113,16 +112,13 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	grp->num_qp++;
 	elem->qp = qp;
-	elem->grp = grp;
 	atomic_inc(&qp->mcg_num);
 
 	list_add(&elem->qp_list, &grp->qp_list);
-	list_add(&elem->grp_list, &qp->grp_list);
 
 	err = 0;
 out:
 	spin_unlock_bh(&grp->mcg_lock);
-	spin_unlock_bh(&qp->grp_lock);
 	return err;
 }
 
@@ -136,18 +132,15 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	if (!grp)
 		goto err1;
 
-	spin_lock_bh(&qp->grp_lock);
 	spin_lock_bh(&grp->mcg_lock);
 
 	list_for_each_entry_safe(elem, tmp, &grp->qp_list, qp_list) {
 		if (elem->qp == qp) {
 			list_del(&elem->qp_list);
-			list_del(&elem->grp_list);
 			grp->num_qp--;
 			atomic_dec(&qp->mcg_num);
 
 			spin_unlock_bh(&grp->mcg_lock);
-			spin_unlock_bh(&qp->grp_lock);
 			rxe_drop_ref(elem);
 			rxe_drop_ref(grp);	/* ref held by QP */
 			rxe_drop_ref(grp);	/* ref from get_key */
@@ -156,7 +149,6 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	}
 
 	spin_unlock_bh(&grp->mcg_lock);
-	spin_unlock_bh(&qp->grp_lock);
 	rxe_drop_ref(grp);			/* ref from get_key */
 err1:
 	return -EINVAL;
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index a21d704dc376..58ccca96c209 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -188,9 +188,6 @@ static void rxe_qp_init_misc(struct rxe_dev *rxe, struct rxe_qp *qp,
 		break;
 	}
 
-	INIT_LIST_HEAD(&qp->grp_list);
-
-	spin_lock_init(&qp->grp_lock);
 	spin_lock_init(&qp->state_lock);
 
 	atomic_set(&qp->ssn, 0);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 4910d0782e33..55f8ed2bc621 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -232,9 +232,6 @@ struct rxe_qp {
 	struct rxe_av		pri_av;
 	struct rxe_av		alt_av;
 
-	/* list of mcast groups qp has joined (for cleanup) */
-	struct list_head	grp_list;
-	spinlock_t		grp_lock; /* guard grp_list */
 	atomic_t		mcg_num;
 
 	struct sk_buff_head	req_pkts;
@@ -368,9 +365,7 @@ struct rxe_mcg {
 struct rxe_mca {
 	struct rxe_pool_elem	elem;
 	struct list_head	qp_list;
-	struct list_head	grp_list;
 	struct rxe_qp		*qp;
-	struct rxe_mcg		*grp;
 };
 
 struct rxe_port {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 07/17] RDMA/rxe: Use kzmalloc/kfree for mca
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (5 preceding siblings ...)
  2022-01-31 22:08 ` [PATCH for-next v10 06/17] RDMA/rxe: Remove qp->grp_lock and qp->grp_list Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-02-01  1:03   ` kernel test robot
                     ` (2 more replies)
  2022-01-31 22:08 ` [PATCH for-next v10 08/17] RDMA/rxe: Rename grp to mcg and mce to mca Bob Pearson
                   ` (10 subsequent siblings)
  17 siblings, 3 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Remove rxe_mca (was rxe_mc_elem) from rxe pools and use kzmalloc
and kfree to allocate and free. Use the sequence

    <lookup qp>
    new_mca = kzalloc(sizeof(*new_mca), GFP_KERNEL);
    <spin lock>
    <lookup qp again> /* in case of a race */
    <init new_mca>
    <spin unlock>

instead of GFP_ATOMIC inside of the spinlock. Add an extra reference
to multicast group to protect the pointer in the index that maps
mgid to group.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |   8 --
 drivers/infiniband/sw/rxe/rxe_mcast.c | 102 +++++++++++++++-----------
 drivers/infiniband/sw/rxe/rxe_pool.c  |   5 --
 drivers/infiniband/sw/rxe/rxe_pool.h  |   1 -
 drivers/infiniband/sw/rxe/rxe_verbs.h |   2 -
 5 files changed, 59 insertions(+), 59 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index fab291245366..c55736e441e7 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -29,7 +29,6 @@ void rxe_dealloc(struct ib_device *ib_dev)
 	rxe_pool_cleanup(&rxe->mr_pool);
 	rxe_pool_cleanup(&rxe->mw_pool);
 	rxe_pool_cleanup(&rxe->mc_grp_pool);
-	rxe_pool_cleanup(&rxe->mc_elem_pool);
 
 	if (rxe->tfm)
 		crypto_free_shash(rxe->tfm);
@@ -163,15 +162,8 @@ static int rxe_init_pools(struct rxe_dev *rxe)
 	if (err)
 		goto err9;
 
-	err = rxe_pool_init(rxe, &rxe->mc_elem_pool, RXE_TYPE_MC_ELEM,
-			    rxe->attr.max_total_mcast_qp_attach);
-	if (err)
-		goto err10;
-
 	return 0;
 
-err10:
-	rxe_pool_cleanup(&rxe->mc_grp_pool);
 err9:
 	rxe_pool_cleanup(&rxe->mw_pool);
 err8:
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 9336295c4ee2..4a5896a225a6 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -26,30 +26,40 @@ static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
 }
 
 /* caller should hold mc_grp_pool->pool_lock */
-static struct rxe_mcg *create_grp(struct rxe_dev *rxe,
-				     struct rxe_pool *pool,
-				     union ib_gid *mgid)
+static int __rxe_create_grp(struct rxe_dev *rxe, struct rxe_pool *pool,
+			    union ib_gid *mgid, struct rxe_mcg **grp_p)
 {
 	int err;
 	struct rxe_mcg *grp;
 
 	grp = rxe_alloc_locked(&rxe->mc_grp_pool);
 	if (!grp)
-		return ERR_PTR(-ENOMEM);
+		return -ENOMEM;
+
+	err = rxe_mcast_add(rxe, mgid);
+	if (unlikely(err)) {
+		rxe_drop_ref(grp);
+		return err;
+	}
 
 	INIT_LIST_HEAD(&grp->qp_list);
 	spin_lock_init(&grp->mcg_lock);
 	grp->rxe = rxe;
+
+	rxe_add_ref(grp);
 	rxe_add_key_locked(grp, mgid);
 
-	err = rxe_mcast_add(rxe, mgid);
-	if (unlikely(err)) {
-		rxe_drop_key_locked(grp);
-		rxe_drop_ref(grp);
-		return ERR_PTR(err);
-	}
+	*grp_p = grp;
+	return 0;
+}
+
+/* caller is holding a ref from lookup and mcg->mcg_lock*/
+void __rxe_destroy_mcg(struct rxe_mcg *grp)
+{
+	rxe_drop_key(grp);
+	rxe_drop_ref(grp);
 
-	return grp;
+	rxe_mcast_delete(grp->rxe, &grp->mgid);
 }
 
 static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
@@ -68,10 +78,9 @@ static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 	if (grp)
 		goto done;
 
-	grp = create_grp(rxe, pool, mgid);
-	if (IS_ERR(grp)) {
+	err = __rxe_create_grp(rxe, pool, mgid, &grp);
+	if (err) {
 		write_unlock_bh(&pool->pool_lock);
-		err = PTR_ERR(grp);
 		return err;
 	}
 
@@ -85,36 +94,44 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			   struct rxe_mcg *grp)
 {
 	int err;
-	struct rxe_mca *elem;
+	struct rxe_mca *mca, *new_mca;
 
-	/* check to see of the qp is already a member of the group */
+	/* check to see if the qp is already a member of the group */
 	spin_lock_bh(&grp->mcg_lock);
-	list_for_each_entry(elem, &grp->qp_list, qp_list) {
-		if (elem->qp == qp) {
+	list_for_each_entry(mca, &grp->qp_list, qp_list) {
+		if (mca->qp == qp) {
+			spin_unlock_bh(&grp->mcg_lock);
+			return 0;
+		}
+	}
+	spin_unlock_bh(&grp->mcg_lock);
+
+	/* speculative alloc new mca without using GFP_ATOMIC */
+	new_mca = kzalloc(sizeof(*mca), GFP_KERNEL);
+	if (!new_mca)
+		return -ENOMEM;
+
+	spin_lock_bh(&grp->mcg_lock);
+	/* re-check to see if someone else just attached qp */
+	list_for_each_entry(mca, &grp->qp_list, qp_list) {
+		if (mca->qp == qp) {
+			kfree(new_mca);
 			err = 0;
 			goto out;
 		}
 	}
+	mca = new_mca;
 
 	if (grp->num_qp >= rxe->attr.max_mcast_qp_attach) {
 		err = -ENOMEM;
 		goto out;
 	}
 
-	elem = rxe_alloc_locked(&rxe->mc_elem_pool);
-	if (!elem) {
-		err = -ENOMEM;
-		goto out;
-	}
-
-	/* each qp holds a ref on the grp */
-	rxe_add_ref(grp);
-
 	grp->num_qp++;
-	elem->qp = qp;
+	mca->qp = qp;
 	atomic_inc(&qp->mcg_num);
 
-	list_add(&elem->qp_list, &grp->qp_list);
+	list_add(&mca->qp_list, &grp->qp_list);
 
 	err = 0;
 out:
@@ -126,7 +143,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 				   union ib_gid *mgid)
 {
 	struct rxe_mcg *grp;
-	struct rxe_mca *elem, *tmp;
+	struct rxe_mca *mca, *tmp;
 
 	grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
 	if (!grp)
@@ -134,33 +151,30 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	spin_lock_bh(&grp->mcg_lock);
 
-	list_for_each_entry_safe(elem, tmp, &grp->qp_list, qp_list) {
-		if (elem->qp == qp) {
-			list_del(&elem->qp_list);
+	list_for_each_entry_safe(mca, tmp, &grp->qp_list, qp_list) {
+		if (mca->qp == qp) {
+			list_del(&mca->qp_list);
 			grp->num_qp--;
+			if (grp->num_qp <= 0)
+				__rxe_destroy_mcg(grp);
 			atomic_dec(&qp->mcg_num);
 
 			spin_unlock_bh(&grp->mcg_lock);
-			rxe_drop_ref(elem);
-			rxe_drop_ref(grp);	/* ref held by QP */
-			rxe_drop_ref(grp);	/* ref from get_key */
+			rxe_drop_ref(grp);
+			kfree(mca);
 			return 0;
 		}
 	}
 
 	spin_unlock_bh(&grp->mcg_lock);
-	rxe_drop_ref(grp);			/* ref from get_key */
+	rxe_drop_ref(grp);
 err1:
 	return -EINVAL;
 }
 
 void rxe_mc_cleanup(struct rxe_pool_elem *elem)
 {
-	struct rxe_mcg *grp = container_of(elem, typeof(*grp), elem);
-	struct rxe_dev *rxe = grp->rxe;
-
-	rxe_drop_key(grp);
-	rxe_mcast_delete(rxe, &grp->mgid);
+	/* nothing left to do */
 }
 
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
@@ -170,13 +184,15 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	struct rxe_qp *qp = to_rqp(ibqp);
 	struct rxe_mcg *grp;
 
-	/* takes a ref on grp if successful */
 	err = rxe_mcast_get_grp(rxe, mgid, &grp);
 	if (err)
 		return err;
 
 	err = rxe_mcast_add_grp_elem(rxe, qp, grp);
 
+	if (grp->num_qp == 0)
+		__rxe_destroy_mcg(grp);
+
 	rxe_drop_ref(grp);
 	return err;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 63c594173565..a6756aa93e2b 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -90,11 +90,6 @@ static const struct rxe_type_info {
 		.key_offset	= offsetof(struct rxe_mcg, mgid),
 		.key_size	= sizeof(union ib_gid),
 	},
-	[RXE_TYPE_MC_ELEM] = {
-		.name		= "rxe-mc_elem",
-		.size		= sizeof(struct rxe_mca),
-		.elem_offset	= offsetof(struct rxe_mca, elem),
-	},
 };
 
 static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min)
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 214279310f4d..511f81554fd1 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -23,7 +23,6 @@ enum rxe_elem_type {
 	RXE_TYPE_MR,
 	RXE_TYPE_MW,
 	RXE_TYPE_MC_GRP,
-	RXE_TYPE_MC_ELEM,
 	RXE_NUM_TYPES,		/* keep me last */
 };
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 55f8ed2bc621..02745d51c163 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -363,7 +363,6 @@ struct rxe_mcg {
 };
 
 struct rxe_mca {
-	struct rxe_pool_elem	elem;
 	struct list_head	qp_list;
 	struct rxe_qp		*qp;
 };
@@ -397,7 +396,6 @@ struct rxe_dev {
 	struct rxe_pool		mr_pool;
 	struct rxe_pool		mw_pool;
 	struct rxe_pool		mc_grp_pool;
-	struct rxe_pool		mc_elem_pool;
 
 	spinlock_t		pending_lock; /* guard pending_mmaps */
 	struct list_head	pending_mmaps;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 08/17] RDMA/rxe: Rename grp to mcg and mce to mca
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (6 preceding siblings ...)
  2022-01-31 22:08 ` [PATCH for-next v10 07/17] RDMA/rxe: Use kzmalloc/kfree for mca Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 09/17] RDMA/rxe: Introduce RXECB(skb) Bob Pearson
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

In rxe_mcast.c and rxe_recv.c replace 'grp' by 'mcg' and 'mce' by 'mca'.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 104 +++++++++++++-------------
 drivers/infiniband/sw/rxe/rxe_recv.c  |   8 +-
 2 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 4a5896a225a6..29e6c9e11c77 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -26,47 +26,47 @@ static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
 }
 
 /* caller should hold mc_grp_pool->pool_lock */
-static int __rxe_create_grp(struct rxe_dev *rxe, struct rxe_pool *pool,
-			    union ib_gid *mgid, struct rxe_mcg **grp_p)
+static int __rxe_create_mcg(struct rxe_dev *rxe, struct rxe_pool *pool,
+			    union ib_gid *mgid, struct rxe_mcg **mcg_p)
 {
 	int err;
-	struct rxe_mcg *grp;
+	struct rxe_mcg *mcg;
 
-	grp = rxe_alloc_locked(&rxe->mc_grp_pool);
-	if (!grp)
+	mcg = rxe_alloc_locked(&rxe->mc_grp_pool);
+	if (!mcg)
 		return -ENOMEM;
 
 	err = rxe_mcast_add(rxe, mgid);
 	if (unlikely(err)) {
-		rxe_drop_ref(grp);
+		rxe_drop_ref(mcg);
 		return err;
 	}
 
-	INIT_LIST_HEAD(&grp->qp_list);
-	spin_lock_init(&grp->mcg_lock);
-	grp->rxe = rxe;
+	INIT_LIST_HEAD(&mcg->qp_list);
+	spin_lock_init(&mcg->mcg_lock);
+	mcg->rxe = rxe;
 
-	rxe_add_ref(grp);
-	rxe_add_key_locked(grp, mgid);
+	rxe_add_ref(mcg);
+	rxe_add_key_locked(mcg, mgid);
 
-	*grp_p = grp;
+	*mcg_p = mcg;
 	return 0;
 }
 
 /* caller is holding a ref from lookup and mcg->mcg_lock*/
-void __rxe_destroy_mcg(struct rxe_mcg *grp)
+void __rxe_destroy_mcg(struct rxe_mcg *mcg)
 {
-	rxe_drop_key(grp);
-	rxe_drop_ref(grp);
+	rxe_drop_key(mcg);
+	rxe_drop_ref(mcg);
 
-	rxe_mcast_delete(grp->rxe, &grp->mgid);
+	rxe_mcast_delete(mcg->rxe, &mcg->mgid);
 }
 
-static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
-			     struct rxe_mcg **grp_p)
+static int rxe_mcast_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
+			     struct rxe_mcg **mcg_p)
 {
 	int err;
-	struct rxe_mcg *grp;
+	struct rxe_mcg *mcg;
 	struct rxe_pool *pool = &rxe->mc_grp_pool;
 
 	if (rxe->attr.max_mcast_qp_attach == 0)
@@ -74,11 +74,11 @@ static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 
 	write_lock_bh(&pool->pool_lock);
 
-	grp = rxe_pool_get_key_locked(pool, mgid);
-	if (grp)
+	mcg = rxe_pool_get_key_locked(pool, mgid);
+	if (mcg)
 		goto done;
 
-	err = __rxe_create_grp(rxe, pool, mgid, &grp);
+	err = __rxe_create_mcg(rxe, pool, mgid, &mcg);
 	if (err) {
 		write_unlock_bh(&pool->pool_lock);
 		return err;
@@ -86,34 +86,34 @@ static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 
 done:
 	write_unlock_bh(&pool->pool_lock);
-	*grp_p = grp;
+	*mcg_p = mcg;
 	return 0;
 }
 
 static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-			   struct rxe_mcg *grp)
+			   struct rxe_mcg *mcg)
 {
 	int err;
 	struct rxe_mca *mca, *new_mca;
 
 	/* check to see if the qp is already a member of the group */
-	spin_lock_bh(&grp->mcg_lock);
-	list_for_each_entry(mca, &grp->qp_list, qp_list) {
+	spin_lock_bh(&mcg->mcg_lock);
+	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
-			spin_unlock_bh(&grp->mcg_lock);
+			spin_unlock_bh(&mcg->mcg_lock);
 			return 0;
 		}
 	}
-	spin_unlock_bh(&grp->mcg_lock);
+	spin_unlock_bh(&mcg->mcg_lock);
 
 	/* speculative alloc new mca without using GFP_ATOMIC */
 	new_mca = kzalloc(sizeof(*mca), GFP_KERNEL);
 	if (!new_mca)
 		return -ENOMEM;
 
-	spin_lock_bh(&grp->mcg_lock);
+	spin_lock_bh(&mcg->mcg_lock);
 	/* re-check to see if someone else just attached qp */
-	list_for_each_entry(mca, &grp->qp_list, qp_list) {
+	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
 			kfree(new_mca);
 			err = 0;
@@ -122,52 +122,52 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	}
 	mca = new_mca;
 
-	if (grp->num_qp >= rxe->attr.max_mcast_qp_attach) {
+	if (mcg->num_qp >= rxe->attr.max_mcast_qp_attach) {
 		err = -ENOMEM;
 		goto out;
 	}
 
-	grp->num_qp++;
+	mcg->num_qp++;
 	mca->qp = qp;
 	atomic_inc(&qp->mcg_num);
 
-	list_add(&mca->qp_list, &grp->qp_list);
+	list_add(&mca->qp_list, &mcg->qp_list);
 
 	err = 0;
 out:
-	spin_unlock_bh(&grp->mcg_lock);
+	spin_unlock_bh(&mcg->mcg_lock);
 	return err;
 }
 
 static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 				   union ib_gid *mgid)
 {
-	struct rxe_mcg *grp;
+	struct rxe_mcg *mcg;
 	struct rxe_mca *mca, *tmp;
 
-	grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
-	if (!grp)
+	mcg = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
+	if (!mcg)
 		goto err1;
 
-	spin_lock_bh(&grp->mcg_lock);
+	spin_lock_bh(&mcg->mcg_lock);
 
-	list_for_each_entry_safe(mca, tmp, &grp->qp_list, qp_list) {
+	list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
 			list_del(&mca->qp_list);
-			grp->num_qp--;
-			if (grp->num_qp <= 0)
-				__rxe_destroy_mcg(grp);
+			mcg->num_qp--;
+			if (mcg->num_qp <= 0)
+				__rxe_destroy_mcg(mcg);
 			atomic_dec(&qp->mcg_num);
 
-			spin_unlock_bh(&grp->mcg_lock);
-			rxe_drop_ref(grp);
+			spin_unlock_bh(&mcg->mcg_lock);
+			rxe_drop_ref(mcg);
 			kfree(mca);
 			return 0;
 		}
 	}
 
-	spin_unlock_bh(&grp->mcg_lock);
-	rxe_drop_ref(grp);
+	spin_unlock_bh(&mcg->mcg_lock);
+	rxe_drop_ref(mcg);
 err1:
 	return -EINVAL;
 }
@@ -182,18 +182,18 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	int err;
 	struct rxe_dev *rxe = to_rdev(ibqp->device);
 	struct rxe_qp *qp = to_rqp(ibqp);
-	struct rxe_mcg *grp;
+	struct rxe_mcg *mcg;
 
-	err = rxe_mcast_get_grp(rxe, mgid, &grp);
+	err = rxe_mcast_get_mcg(rxe, mgid, &mcg);
 	if (err)
 		return err;
 
-	err = rxe_mcast_add_grp_elem(rxe, qp, grp);
+	err = rxe_mcast_add_grp_elem(rxe, qp, mcg);
 
-	if (grp->num_qp == 0)
-		__rxe_destroy_mcg(grp);
+	if (mcg->num_qp == 0)
+		__rxe_destroy_mcg(mcg);
 
-	rxe_drop_ref(grp);
+	rxe_drop_ref(mcg);
 	return err;
 }
 
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 7ff6b53555f4..814a002b8911 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -234,7 +234,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 {
 	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
 	struct rxe_mcg *mcg;
-	struct rxe_mca *mce;
+	struct rxe_mca *mca;
 	struct rxe_qp *qp;
 	union ib_gid dgid;
 	int err;
@@ -257,8 +257,8 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 	 * single QP happen and just move on and try
 	 * the rest of them on the list
 	 */
-	list_for_each_entry(mce, &mcg->qp_list, qp_list) {
-		qp = mce->qp;
+	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
+		qp = mca->qp;
 
 		/* validate qp for incoming packet */
 		err = check_type_state(rxe, pkt, qp);
@@ -273,7 +273,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 		 * skb and pass to the QP. Pass the original skb to
 		 * the last QP in the list.
 		 */
-		if (mce->qp_list.next != &mcg->qp_list) {
+		if (mca->qp_list.next != &mcg->qp_list) {
 			struct sk_buff *cskb;
 			struct rxe_pkt_info *cpkt;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 09/17] RDMA/rxe: Introduce RXECB(skb)
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (7 preceding siblings ...)
  2022-01-31 22:08 ` [PATCH for-next v10 08/17] RDMA/rxe: Rename grp to mcg and mce to mca Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 10/17] RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases Bob Pearson
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Add a #define RXECB(skb) to rxe_hdr.h as a short cut to
refer to single members of rxe_pkt_info which is stored in skb->cb
in the receive path. Use this to make some cleanups in rxe_recv.c

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_hdr.h  |  3 ++
 drivers/infiniband/sw/rxe/rxe_recv.c | 55 +++++++++++++---------------
 2 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_hdr.h b/drivers/infiniband/sw/rxe/rxe_hdr.h
index e432f9e37795..2a85d1e40e6a 100644
--- a/drivers/infiniband/sw/rxe/rxe_hdr.h
+++ b/drivers/infiniband/sw/rxe/rxe_hdr.h
@@ -36,6 +36,9 @@ static inline struct sk_buff *PKT_TO_SKB(struct rxe_pkt_info *pkt)
 	return container_of((void *)pkt, struct sk_buff, cb);
 }
 
+/* alternative to access a single element of rxe_pkt_info from skb */
+#define RXECB(skb) ((struct rxe_pkt_info *)((skb)->cb))
+
 /*
  * IBA header types and methods
  *
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 814a002b8911..10020103ea4a 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -107,17 +107,15 @@ static int check_keys(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
 	return -EINVAL;
 }
 
-static int check_addr(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
+static int check_addr(struct rxe_dev *rxe, struct sk_buff *skb,
 		      struct rxe_qp *qp)
 {
-	struct sk_buff *skb = PKT_TO_SKB(pkt);
-
 	if (qp_type(qp) != IB_QPT_RC && qp_type(qp) != IB_QPT_UC)
 		goto done;
 
-	if (unlikely(pkt->port_num != qp->attr.port_num)) {
+	if (unlikely(RXECB(skb)->port_num != qp->attr.port_num)) {
 		pr_warn_ratelimited("port %d != qp port %d\n",
-				    pkt->port_num, qp->attr.port_num);
+				    RXECB(skb)->port_num, qp->attr.port_num);
 		goto err1;
 	}
 
@@ -167,8 +165,9 @@ static int check_addr(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
 	return -EINVAL;
 }
 
-static int hdr_check(struct rxe_pkt_info *pkt)
+static int hdr_check(struct sk_buff *skb)
 {
+	struct rxe_pkt_info *pkt = RXECB(skb);
 	struct rxe_dev *rxe = pkt->rxe;
 	struct rxe_port *port = &rxe->port;
 	struct rxe_qp *qp = NULL;
@@ -199,7 +198,7 @@ static int hdr_check(struct rxe_pkt_info *pkt)
 		if (unlikely(err))
 			goto err2;
 
-		err = check_addr(rxe, pkt, qp);
+		err = check_addr(rxe, skb, qp);
 		if (unlikely(err))
 			goto err2;
 
@@ -222,17 +221,19 @@ static int hdr_check(struct rxe_pkt_info *pkt)
 	return -EINVAL;
 }
 
-static inline void rxe_rcv_pkt(struct rxe_pkt_info *pkt, struct sk_buff *skb)
+static inline void rxe_rcv_pkt(struct sk_buff *skb)
 {
-	if (pkt->mask & RXE_REQ_MASK)
-		rxe_resp_queue_pkt(pkt->qp, skb);
+	if (RXECB(skb)->mask & RXE_REQ_MASK)
+		rxe_resp_queue_pkt(RXECB(skb)->qp, skb);
 	else
-		rxe_comp_queue_pkt(pkt->qp, skb);
+		rxe_comp_queue_pkt(RXECB(skb)->qp, skb);
 }
 
-static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
+static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 {
-	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+	struct sk_buff *s;
+	struct rxe_pkt_info *pkt = RXECB(skb);
+	struct rxe_dev *rxe = pkt->rxe;
 	struct rxe_mcg *mcg;
 	struct rxe_mca *mca;
 	struct rxe_qp *qp;
@@ -274,26 +275,22 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 		 * the last QP in the list.
 		 */
 		if (mca->qp_list.next != &mcg->qp_list) {
-			struct sk_buff *cskb;
-			struct rxe_pkt_info *cpkt;
-
-			cskb = skb_clone(skb, GFP_ATOMIC);
-			if (unlikely(!cskb))
+			s = skb_clone(skb, GFP_ATOMIC);
+			if (unlikely(!s))
 				continue;
 
 			if (WARN_ON(!ib_device_try_get(&rxe->ib_dev))) {
-				kfree_skb(cskb);
+				kfree_skb(s);
 				break;
 			}
 
-			cpkt = SKB_TO_PKT(cskb);
-			cpkt->qp = qp;
+			RXECB(s)->qp = qp;
 			rxe_add_ref(qp);
-			rxe_rcv_pkt(cpkt, cskb);
+			rxe_rcv_pkt(s);
 		} else {
-			pkt->qp = qp;
+			RXECB(skb)->qp = qp;
 			rxe_add_ref(qp);
-			rxe_rcv_pkt(pkt, skb);
+			rxe_rcv_pkt(skb);
 			skb = NULL;	/* mark consumed */
 		}
 	}
@@ -326,7 +323,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
  */
 static int rxe_chk_dgid(struct rxe_dev *rxe, struct sk_buff *skb)
 {
-	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+	struct rxe_pkt_info *pkt = RXECB(skb);
 	const struct ib_gid_attr *gid_attr;
 	union ib_gid dgid;
 	union ib_gid *pdgid;
@@ -359,7 +356,7 @@ static int rxe_chk_dgid(struct rxe_dev *rxe, struct sk_buff *skb)
 void rxe_rcv(struct sk_buff *skb)
 {
 	int err;
-	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+	struct rxe_pkt_info *pkt = RXECB(skb);
 	struct rxe_dev *rxe = pkt->rxe;
 
 	if (unlikely(skb->len < RXE_BTH_BYTES))
@@ -378,7 +375,7 @@ void rxe_rcv(struct sk_buff *skb)
 	if (unlikely(skb->len < header_size(pkt)))
 		goto drop;
 
-	err = hdr_check(pkt);
+	err = hdr_check(skb);
 	if (unlikely(err))
 		goto drop;
 
@@ -389,9 +386,9 @@ void rxe_rcv(struct sk_buff *skb)
 	rxe_counter_inc(rxe, RXE_CNT_RCVD_PKTS);
 
 	if (unlikely(bth_qpn(pkt) == IB_MULTICAST_QPN))
-		rxe_rcv_mcast_pkt(rxe, skb);
+		rxe_rcv_mcast_pkt(skb);
 	else
-		rxe_rcv_pkt(pkt, skb);
+		rxe_rcv_pkt(skb);
 
 	return;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 10/17] RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (8 preceding siblings ...)
  2022-01-31 22:08 ` [PATCH for-next v10 09/17] RDMA/rxe: Introduce RXECB(skb) Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 11/17] RDMA/rxe: Replace mcg locks by rxe->mcg_lock Bob Pearson
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently rxe_rcv_mcast_pkt performs most of its work under the
mcg->mcg_lock and calls into rxe_rcv which queues the packets
to the responder and completer tasklets holding the lock which is
a very bad idea. This patch walks the qp_list in mcg and copies the
qp addresses to a dynamically allocated array under the lock but
does the rest of the work without holding the lock. The critical
section is now very small.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 11 ++++---
 drivers/infiniband/sw/rxe/rxe_recv.c  | 41 +++++++++++++++++++++------
 drivers/infiniband/sw/rxe/rxe_verbs.h |  2 +-
 3 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 29e6c9e11c77..52bd46ca22c9 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -122,16 +122,16 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	}
 	mca = new_mca;
 
-	if (mcg->num_qp >= rxe->attr.max_mcast_qp_attach) {
+	if (atomic_read(&mcg->qp_num) >= rxe->attr.max_mcast_qp_attach) {
 		err = -ENOMEM;
 		goto out;
 	}
 
-	mcg->num_qp++;
+	atomic_inc(&mcg->qp_num);
 	mca->qp = qp;
 	atomic_inc(&qp->mcg_num);
 
-	list_add(&mca->qp_list, &mcg->qp_list);
+	list_add_tail(&mca->qp_list, &mcg->qp_list);
 
 	err = 0;
 out:
@@ -154,8 +154,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
 			list_del(&mca->qp_list);
-			mcg->num_qp--;
-			if (mcg->num_qp <= 0)
+			if (atomic_dec_return(&mcg->qp_num) <= 0)
 				__rxe_destroy_mcg(mcg);
 			atomic_dec(&qp->mcg_num);
 
@@ -190,7 +189,7 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 
 	err = rxe_mcast_add_grp_elem(rxe, qp, mcg);
 
-	if (mcg->num_qp == 0)
+	if (atomic_read(&mcg->qp_num) == 0)
 		__rxe_destroy_mcg(mcg);
 
 	rxe_drop_ref(mcg);
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 10020103ea4a..ed80125f1dc5 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -229,6 +229,11 @@ static inline void rxe_rcv_pkt(struct sk_buff *skb)
 		rxe_comp_queue_pkt(RXECB(skb)->qp, skb);
 }
 
+/* split processing of the qp list into two stages.
+ * first just make a simple linear array from the
+ * current list while holding the lock and then
+ * process each qp without holding the lock.
+ */
 static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 {
 	struct sk_buff *s;
@@ -237,7 +242,9 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 	struct rxe_mcg *mcg;
 	struct rxe_mca *mca;
 	struct rxe_qp *qp;
+	struct rxe_qp **qp_array;
 	union ib_gid dgid;
+	int n, nmax;
 	int err;
 
 	if (skb->protocol == htons(ETH_P_IP))
@@ -251,15 +258,32 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 	if (!mcg)
 		goto drop;	/* mcast group not registered */
 
+	/* this is the current number of qp's attached to mcg plus a
+	 * little room in case new qp's are attached. It isn't wrong
+	 * to miss some qp's since it is just a matter of precisely
+	 * when the packet is assumed to be received.
+	 */
+	nmax = atomic_read(&mcg->qp_num) + 2;
+	qp_array = kmalloc_array(nmax, sizeof(qp), GFP_KERNEL);
+
+	n = 0;
 	spin_lock_bh(&mcg->mcg_lock);
+	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
+		rxe_add_ref(mca->qp);
+		qp_array[n++] = mca->qp;
+		if (n == nmax)
+			break;
+	}
+	spin_unlock_bh(&mcg->mcg_lock);
+	nmax = n;
 
 	/* this is unreliable datagram service so we let
 	 * failures to deliver a multicast packet to a
 	 * single QP happen and just move on and try
 	 * the rest of them on the list
 	 */
-	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
-		qp = mca->qp;
+	for (n = 0; n < nmax; n++) {
+		qp = qp_array[n];
 
 		/* validate qp for incoming packet */
 		err = check_type_state(rxe, pkt, qp);
@@ -274,28 +298,27 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 		 * skb and pass to the QP. Pass the original skb to
 		 * the last QP in the list.
 		 */
-		if (mca->qp_list.next != &mcg->qp_list) {
-			s = skb_clone(skb, GFP_ATOMIC);
+		if (n < nmax - 1) {
+			s = skb_clone(skb, GFP_KERNEL);
 			if (unlikely(!s))
 				continue;
 
+			RXECB(s)->qp = qp;
 			if (WARN_ON(!ib_device_try_get(&rxe->ib_dev))) {
+				rxe_drop_ref(RXECB(s)->qp);
 				kfree_skb(s);
-				break;
+				continue;
 			}
 
-			RXECB(s)->qp = qp;
-			rxe_add_ref(qp);
 			rxe_rcv_pkt(s);
 		} else {
 			RXECB(skb)->qp = qp;
-			rxe_add_ref(qp);
 			rxe_rcv_pkt(skb);
 			skb = NULL;	/* mark consumed */
 		}
 	}
 
-	spin_unlock_bh(&mcg->mcg_lock);
+	kfree(qp_array);
 
 	rxe_drop_ref(mcg);	/* drop ref from rxe_pool_get_key. */
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 02745d51c163..d65c358798c6 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -356,8 +356,8 @@ struct rxe_mcg {
 	spinlock_t		mcg_lock; /* guard group */
 	struct rxe_dev		*rxe;
 	struct list_head	qp_list;
+	atomic_t		qp_num;
 	union ib_gid		mgid;
-	int			num_qp;
 	u32			qkey;
 	u16			pkey;
 };
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 11/17] RDMA/rxe: Replace mcg locks by rxe->mcg_lock
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (9 preceding siblings ...)
  2022-01-31 22:08 ` [PATCH for-next v10 10/17] RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 12/17] RDMA/rxe: Replace pool key by rxe->mcg_tree Bob Pearson
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Starting to decouple mcg from rxe pools, replace the spin lock
mcg->mcg_lock and the read/write lock pool->pool_lock by rxe->mcg_lock.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |  2 ++
 drivers/infiniband/sw/rxe/rxe_mcast.c | 25 ++++++++++++-------------
 drivers/infiniband/sw/rxe/rxe_recv.c  |  4 ++--
 drivers/infiniband/sw/rxe/rxe_verbs.h |  3 ++-
 4 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index c55736e441e7..46a07e2d9dcf 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -198,6 +198,8 @@ static int rxe_init(struct rxe_dev *rxe)
 	if (err)
 		return err;
 
+	spin_lock_init(&rxe->mcg_lock);
+
 	/* init pending mmap list */
 	spin_lock_init(&rxe->mmap_offset_lock);
 	spin_lock_init(&rxe->pending_lock);
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 52bd46ca22c9..d35070777214 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -25,7 +25,7 @@ static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
 	return dev_mc_del(rxe->ndev, ll_addr);
 }
 
-/* caller should hold mc_grp_pool->pool_lock */
+/* caller should hold rxe->mcg_lock */
 static int __rxe_create_mcg(struct rxe_dev *rxe, struct rxe_pool *pool,
 			    union ib_gid *mgid, struct rxe_mcg **mcg_p)
 {
@@ -43,7 +43,6 @@ static int __rxe_create_mcg(struct rxe_dev *rxe, struct rxe_pool *pool,
 	}
 
 	INIT_LIST_HEAD(&mcg->qp_list);
-	spin_lock_init(&mcg->mcg_lock);
 	mcg->rxe = rxe;
 
 	rxe_add_ref(mcg);
@@ -72,7 +71,7 @@ static int rxe_mcast_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 	if (rxe->attr.max_mcast_qp_attach == 0)
 		return -EINVAL;
 
-	write_lock_bh(&pool->pool_lock);
+	spin_lock_bh(&rxe->mcg_lock);
 
 	mcg = rxe_pool_get_key_locked(pool, mgid);
 	if (mcg)
@@ -80,12 +79,12 @@ static int rxe_mcast_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 
 	err = __rxe_create_mcg(rxe, pool, mgid, &mcg);
 	if (err) {
-		write_unlock_bh(&pool->pool_lock);
+		spin_unlock_bh(&rxe->mcg_lock);
 		return err;
 	}
 
 done:
-	write_unlock_bh(&pool->pool_lock);
+	spin_unlock_bh(&rxe->mcg_lock);
 	*mcg_p = mcg;
 	return 0;
 }
@@ -97,21 +96,21 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	struct rxe_mca *mca, *new_mca;
 
 	/* check to see if the qp is already a member of the group */
-	spin_lock_bh(&mcg->mcg_lock);
+	spin_lock_bh(&rxe->mcg_lock);
 	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
-			spin_unlock_bh(&mcg->mcg_lock);
+			spin_unlock_bh(&rxe->mcg_lock);
 			return 0;
 		}
 	}
-	spin_unlock_bh(&mcg->mcg_lock);
+	spin_unlock_bh(&rxe->mcg_lock);
 
 	/* speculative alloc new mca without using GFP_ATOMIC */
 	new_mca = kzalloc(sizeof(*mca), GFP_KERNEL);
 	if (!new_mca)
 		return -ENOMEM;
 
-	spin_lock_bh(&mcg->mcg_lock);
+	spin_lock_bh(&rxe->mcg_lock);
 	/* re-check to see if someone else just attached qp */
 	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
@@ -135,7 +134,7 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	err = 0;
 out:
-	spin_unlock_bh(&mcg->mcg_lock);
+	spin_unlock_bh(&rxe->mcg_lock);
 	return err;
 }
 
@@ -149,7 +148,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	if (!mcg)
 		goto err1;
 
-	spin_lock_bh(&mcg->mcg_lock);
+	spin_lock_bh(&rxe->mcg_lock);
 
 	list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
@@ -158,14 +157,14 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 				__rxe_destroy_mcg(mcg);
 			atomic_dec(&qp->mcg_num);
 
-			spin_unlock_bh(&mcg->mcg_lock);
+			spin_unlock_bh(&rxe->mcg_lock);
 			rxe_drop_ref(mcg);
 			kfree(mca);
 			return 0;
 		}
 	}
 
-	spin_unlock_bh(&mcg->mcg_lock);
+	spin_unlock_bh(&rxe->mcg_lock);
 	rxe_drop_ref(mcg);
 err1:
 	return -EINVAL;
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index ed80125f1dc5..9a45743c8eaa 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -267,14 +267,14 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 	qp_array = kmalloc_array(nmax, sizeof(qp), GFP_KERNEL);
 
 	n = 0;
-	spin_lock_bh(&mcg->mcg_lock);
+	spin_lock_bh(&rxe->mcg_lock);
 	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
 		rxe_add_ref(mca->qp);
 		qp_array[n++] = mca->qp;
 		if (n == nmax)
 			break;
 	}
-	spin_unlock_bh(&mcg->mcg_lock);
+	spin_unlock_bh(&rxe->mcg_lock);
 	nmax = n;
 
 	/* this is unreliable datagram service so we let
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index d65c358798c6..b72f8f09d984 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -353,7 +353,6 @@ struct rxe_mw {
 
 struct rxe_mcg {
 	struct rxe_pool_elem	elem;
-	spinlock_t		mcg_lock; /* guard group */
 	struct rxe_dev		*rxe;
 	struct list_head	qp_list;
 	atomic_t		qp_num;
@@ -397,6 +396,8 @@ struct rxe_dev {
 	struct rxe_pool		mw_pool;
 	struct rxe_pool		mc_grp_pool;
 
+	spinlock_t		mcg_lock; /* guard multicast groups */
+
 	spinlock_t		pending_lock; /* guard pending_mmaps */
 	struct list_head	pending_mmaps;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 12/17] RDMA/rxe: Replace pool key by rxe->mcg_tree
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (10 preceding siblings ...)
  2022-01-31 22:08 ` [PATCH for-next v10 11/17] RDMA/rxe: Replace mcg locks by rxe->mcg_lock Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 13/17] RDMA/rxe: Remove key'ed object support Bob Pearson
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Continuing to decouple mcg from rxe pools. Create red-black tree code
in rxe_mcast.c to hold mcg index. Replace pool key calls by calls
to local red-black routines.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |   1 +
 drivers/infiniband/sw/rxe/rxe_loc.h   |   2 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c | 233 +++++++++++++++++++++-----
 drivers/infiniband/sw/rxe/rxe_pool.c  |   6 +-
 drivers/infiniband/sw/rxe/rxe_recv.c  |   4 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h |   3 +
 6 files changed, 197 insertions(+), 52 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 46a07e2d9dcf..310e184ae9e8 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -199,6 +199,7 @@ static int rxe_init(struct rxe_dev *rxe)
 		return err;
 
 	spin_lock_init(&rxe->mcg_lock);
+	rxe->mcg_tree = RB_ROOT;
 
 	/* init pending mmap list */
 	spin_lock_init(&rxe->mmap_offset_lock);
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index af40e3c212fb..bd701af7758c 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -40,7 +40,7 @@ void rxe_cq_disable(struct rxe_cq *cq);
 void rxe_cq_cleanup(struct rxe_pool_elem *arg);
 
 /* rxe_mcast.c */
-void rxe_mc_cleanup(struct rxe_pool_elem *arg);
+struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid);
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index d35070777214..82669d14d8a9 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -25,68 +25,189 @@ static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
 	return dev_mc_del(rxe->ndev, ll_addr);
 }
 
-/* caller should hold rxe->mcg_lock */
-static int __rxe_create_mcg(struct rxe_dev *rxe, struct rxe_pool *pool,
-			    union ib_gid *mgid, struct rxe_mcg **mcg_p)
+/**
+ * __rxe_insert_mcg - insert an mcg into red-black tree (rxe->mcg_tree)
+ * @mcg: mcg object with an embedded red-black tree node
+ *
+ * Context: caller must hold a reference to mcg and rxe->mcg_lock and
+ * is responsible to avoid adding the same mcg twice to the tree.
+ */
+static void __rxe_insert_mcg(struct rxe_mcg *mcg)
 {
-	int err;
+	struct rb_root *tree = &mcg->rxe->mcg_tree;
+	struct rb_node **link = &tree->rb_node;
+	struct rb_node *node = NULL;
+	struct rxe_mcg *tmp;
+	int cmp;
+
+	while (*link) {
+		node = *link;
+		tmp = rb_entry(node, struct rxe_mcg, node);
+
+		cmp = memcmp(&tmp->mgid, &mcg->mgid, sizeof(mcg->mgid));
+		if (cmp > 0)
+			link = &(*link)->rb_left;
+		else
+			link = &(*link)->rb_right;
+	}
+
+	rb_link_node(&mcg->node, node, link);
+	rb_insert_color(&mcg->node, tree);
+}
+
+/**
+ * __rxe_remove_mcg - remove an mcg from red-black tree holding lock
+ * @mcg: mcast group object with an embedded red-black tree node
+ *
+ * Context: caller must hold a reference to mcg and rxe->mcg_lock
+ */
+static void __rxe_remove_mcg(struct rxe_mcg *mcg)
+{
+	rb_erase(&mcg->node, &mcg->rxe->mcg_tree);
+}
+
+/**
+ * __rxe_lookup_mcg - lookup mcg in rxe->mcg_tree while holding lock
+ * @rxe: rxe device object
+ * @mgid: multicast IP address
+ *
+ * Context: caller must hold rxe->mcg_lock
+ * Returns: mcg on success and takes a ref to mcg else NULL
+ */
+static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe,
+					union ib_gid *mgid)
+{
+	struct rb_root *tree = &rxe->mcg_tree;
 	struct rxe_mcg *mcg;
+	struct rb_node *node;
+	int cmp;
 
-	mcg = rxe_alloc_locked(&rxe->mc_grp_pool);
-	if (!mcg)
-		return -ENOMEM;
+	node = tree->rb_node;
+
+	while (node) {
+		mcg = rb_entry(node, struct rxe_mcg, node);
+
+		cmp = memcmp(&mcg->mgid, mgid, sizeof(*mgid));
+
+		if (cmp > 0)
+			node = node->rb_left;
+		else if (cmp < 0)
+			node = node->rb_right;
+		else
+			break;
+	}
+
+	if (node) {
+		rxe_add_ref(mcg);
+		return mcg;
+	}
+
+	return NULL;
+}
+
+/**
+ * rxe_lookup_mcg - lookup up mcg in red-back tree
+ * @rxe: rxe device object
+ * @mgid: multicast IP address
+ *
+ * Returns: mcg if found else NULL
+ */
+struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
+{
+	struct rxe_mcg *mcg;
+
+	spin_lock_bh(&rxe->mcg_lock);
+	mcg = __rxe_lookup_mcg(rxe, mgid);
+	spin_unlock_bh(&rxe->mcg_lock);
+
+	return mcg;
+}
+
+/**
+ * __rxe_init_mcg - initialize a new mcg
+ * @rxe: rxe device
+ * @mgid: multicast address as a gid
+ * @mcg: new mcg object
+ *
+ * Context: caller should hold rxe->mcg lock
+ * Returns: 0 on success else an error
+ */
+static int __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
+			  struct rxe_mcg *mcg)
+{
+	int err;
 
 	err = rxe_mcast_add(rxe, mgid);
-	if (unlikely(err)) {
-		rxe_drop_ref(mcg);
+	if (unlikely(err))
 		return err;
-	}
 
+	memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid));
 	INIT_LIST_HEAD(&mcg->qp_list);
 	mcg->rxe = rxe;
 
 	rxe_add_ref(mcg);
-	rxe_add_key_locked(mcg, mgid);
+	__rxe_insert_mcg(mcg);
 
-	*mcg_p = mcg;
-	return 0;
-}
 
-/* caller is holding a ref from lookup and mcg->mcg_lock*/
-void __rxe_destroy_mcg(struct rxe_mcg *mcg)
-{
-	rxe_drop_key(mcg);
-	rxe_drop_ref(mcg);
-
-	rxe_mcast_delete(mcg->rxe, &mcg->mgid);
+	return 0;
 }
 
-static int rxe_mcast_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
-			     struct rxe_mcg **mcg_p)
+/**
+ * rxe_get_mcg - lookup or allocate a mcg
+ * @rxe: rxe device object
+ * @mgid: multicast IP address
+ * @mcgp: address of returned mcg value
+ *
+ * Returns: 0 and sets *mcgp to mcg on success else an error
+ */
+static int rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
+		       struct rxe_mcg **mcgp)
 {
-	int err;
-	struct rxe_mcg *mcg;
 	struct rxe_pool *pool = &rxe->mc_grp_pool;
+	struct rxe_mcg *mcg, *tmp;
+	int err;
 
-	if (rxe->attr.max_mcast_qp_attach == 0)
+	if (rxe->attr.max_mcast_grp == 0)
 		return -EINVAL;
 
-	spin_lock_bh(&rxe->mcg_lock);
+	/* check to see if mcg already exists */
+	mcg = rxe_lookup_mcg(rxe, mgid);
+	if (mcg) {
+		*mcgp = mcg;
+		return 0;
+	}
 
-	mcg = rxe_pool_get_key_locked(pool, mgid);
-	if (mcg)
-		goto done;
+	/* speculative alloc of mcg */
+	mcg = rxe_alloc(pool);
+	if (!mcg)
+		return -ENOMEM;
 
-	err = __rxe_create_mcg(rxe, pool, mgid, &mcg);
-	if (err) {
-		spin_unlock_bh(&rxe->mcg_lock);
-		return err;
+	spin_lock_bh(&rxe->mcg_lock);
+	/* re-check to see if someone else just added it */
+	tmp = __rxe_lookup_mcg(rxe, mgid);
+	if (tmp) {
+		rxe_drop_ref(mcg);
+		mcg = tmp;
+		goto out;
 	}
 
-done:
+	if (atomic_inc_return(&rxe->mcg_num) > rxe->attr.max_mcast_grp) {
+		err = -ENOMEM;
+		goto err_dec;
+	}
+
+	err = __rxe_init_mcg(rxe, mgid, mcg);
+	if (err)
+		goto err_dec;
+out:
 	spin_unlock_bh(&rxe->mcg_lock);
-	*mcg_p = mcg;
+	*mcgp = mcg;
 	return 0;
+err_dec:
+	atomic_dec(&rxe->mcg_num);
+	spin_unlock_bh(&rxe->mcg_lock);
+	rxe_drop_ref(mcg);
+	return err;
 }
 
 static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
@@ -138,13 +259,42 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	return err;
 }
 
+/**
+ * __rxe_destroy_mcg - destroy mcg object holding rxe->mcg_lock
+ * @mcg: the mcg object
+ *
+ * Context: caller is holding rxe->mcg_lock, all refs to mcg are dropped
+ * no qp's are attached to mcg
+ */
+void __rxe_destroy_mcg(struct rxe_mcg *mcg)
+{
+	__rxe_remove_mcg(mcg);
+
+	rxe_drop_ref(mcg);
+
+	rxe_mcast_delete(mcg->rxe, &mcg->mgid);
+}
+
+/**
+ * rxe_destroy_mcg - destroy mcg object
+ * @mcg: the mcg object
+ *
+ * Context: all refs to mcg are dropped, no qp's are attached to mcg
+ */
+static void rxe_destroy_mcg(struct rxe_mcg *mcg)
+{
+	spin_lock_bh(&mcg->rxe->mcg_lock);
+	__rxe_destroy_mcg(mcg);
+	spin_unlock_bh(&mcg->rxe->mcg_lock);
+}
+
 static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 				   union ib_gid *mgid)
 {
 	struct rxe_mcg *mcg;
 	struct rxe_mca *mca, *tmp;
 
-	mcg = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
+	mcg = rxe_lookup_mcg(rxe, mgid);
 	if (!mcg)
 		goto err1;
 
@@ -170,11 +320,6 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	return -EINVAL;
 }
 
-void rxe_mc_cleanup(struct rxe_pool_elem *elem)
-{
-	/* nothing left to do */
-}
-
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 {
 	int err;
@@ -182,14 +327,14 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	struct rxe_qp *qp = to_rqp(ibqp);
 	struct rxe_mcg *mcg;
 
-	err = rxe_mcast_get_mcg(rxe, mgid, &mcg);
+	err = rxe_get_mcg(rxe, mgid, &mcg);
 	if (err)
 		return err;
 
 	err = rxe_mcast_add_grp_elem(rxe, qp, mcg);
 
 	if (atomic_read(&mcg->qp_num) == 0)
-		__rxe_destroy_mcg(mcg);
+		rxe_destroy_mcg(mcg);
 
 	rxe_drop_ref(mcg);
 	return err;
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index a6756aa93e2b..4eff95d57aa4 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -82,13 +82,9 @@ static const struct rxe_type_info {
 		.max_index	= RXE_MAX_MW_INDEX,
 	},
 	[RXE_TYPE_MC_GRP] = {
-		.name		= "rxe-mc_grp",
+		.name		= "rxe-mcg",
 		.size		= sizeof(struct rxe_mcg),
 		.elem_offset	= offsetof(struct rxe_mcg, elem),
-		.cleanup	= rxe_mc_cleanup,
-		.flags		= RXE_POOL_KEY,
-		.key_offset	= offsetof(struct rxe_mcg, mgid),
-		.key_size	= sizeof(union ib_gid),
 	},
 };
 
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 9a45743c8eaa..9a92e5a486ee 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -254,7 +254,7 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 		memcpy(&dgid, &ipv6_hdr(skb)->daddr, sizeof(dgid));
 
 	/* lookup mcast group corresponding to mgid, takes a ref */
-	mcg = rxe_pool_get_key(&rxe->mc_grp_pool, &dgid);
+	mcg = rxe_lookup_mcg(rxe, &dgid);
 	if (!mcg)
 		goto drop;	/* mcast group not registered */
 
@@ -320,7 +320,7 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 
 	kfree(qp_array);
 
-	rxe_drop_ref(mcg);	/* drop ref from rxe_pool_get_key. */
+	rxe_drop_ref(mcg);
 
 	if (likely(!skb))
 		return;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index b72f8f09d984..ea2d9ff29744 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -353,6 +353,7 @@ struct rxe_mw {
 
 struct rxe_mcg {
 	struct rxe_pool_elem	elem;
+	struct rb_node		node;
 	struct rxe_dev		*rxe;
 	struct list_head	qp_list;
 	atomic_t		qp_num;
@@ -397,6 +398,8 @@ struct rxe_dev {
 	struct rxe_pool		mc_grp_pool;
 
 	spinlock_t		mcg_lock; /* guard multicast groups */
+	struct rb_root		mcg_tree;
+	atomic_t		mcg_num;
 
 	spinlock_t		pending_lock; /* guard pending_mmaps */
 	struct list_head	pending_mmaps;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 13/17] RDMA/rxe: Remove key'ed object support
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (11 preceding siblings ...)
  2022-01-31 22:08 ` [PATCH for-next v10 12/17] RDMA/rxe: Replace pool key by rxe->mcg_tree Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 14/17] RDMA/rxe: Remove mcg from rxe pools Bob Pearson
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Now that rxe_mcast.c has it's own red-black tree support there is no
longer any requirement for key'ed objects in rxe pools. This patch
removes the key APIs and related code.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 123 ---------------------------
 drivers/infiniband/sw/rxe/rxe_pool.h |  38 ---------
 2 files changed, 161 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 4eff95d57aa4..fe0fcca47d8d 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -16,8 +16,6 @@ static const struct rxe_type_info {
 	enum rxe_pool_flags flags;
 	u32 min_index;
 	u32 max_index;
-	size_t key_offset;
-	size_t key_size;
 } rxe_type_info[RXE_NUM_TYPES] = {
 	[RXE_TYPE_UC] = {
 		.name		= "rxe-uc",
@@ -143,12 +141,6 @@ int rxe_pool_init(
 			goto out;
 	}
 
-	if (pool->flags & RXE_POOL_KEY) {
-		pool->key.tree = RB_ROOT;
-		pool->key.key_offset = info->key_offset;
-		pool->key.key_size = info->key_size;
-	}
-
 out:
 	return err;
 }
@@ -205,77 +197,6 @@ static int rxe_insert_index(struct rxe_pool *pool, struct rxe_pool_elem *new)
 	return 0;
 }
 
-static int rxe_insert_key(struct rxe_pool *pool, struct rxe_pool_elem *new)
-{
-	struct rb_node **link = &pool->key.tree.rb_node;
-	struct rb_node *parent = NULL;
-	struct rxe_pool_elem *elem;
-	int cmp;
-
-	while (*link) {
-		parent = *link;
-		elem = rb_entry(parent, struct rxe_pool_elem, key_node);
-
-		cmp = memcmp((u8 *)elem + pool->key.key_offset,
-			     (u8 *)new + pool->key.key_offset,
-			     pool->key.key_size);
-
-		if (cmp == 0) {
-			pr_warn("key already exists!\n");
-			return -EINVAL;
-		}
-
-		if (cmp > 0)
-			link = &(*link)->rb_left;
-		else
-			link = &(*link)->rb_right;
-	}
-
-	rb_link_node(&new->key_node, parent, link);
-	rb_insert_color(&new->key_node, &pool->key.tree);
-
-	return 0;
-}
-
-int __rxe_add_key_locked(struct rxe_pool_elem *elem, void *key)
-{
-	struct rxe_pool *pool = elem->pool;
-	int err;
-
-	memcpy((u8 *)elem + pool->key.key_offset, key, pool->key.key_size);
-	err = rxe_insert_key(pool, elem);
-
-	return err;
-}
-
-int __rxe_add_key(struct rxe_pool_elem *elem, void *key)
-{
-	struct rxe_pool *pool = elem->pool;
-	int err;
-
-	write_lock_bh(&pool->pool_lock);
-	err = __rxe_add_key_locked(elem, key);
-	write_unlock_bh(&pool->pool_lock);
-
-	return err;
-}
-
-void __rxe_drop_key_locked(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-
-	rb_erase(&elem->key_node, &pool->key.tree);
-}
-
-void __rxe_drop_key(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-
-	write_lock_bh(&pool->pool_lock);
-	__rxe_drop_key_locked(elem);
-	write_unlock_bh(&pool->pool_lock);
-}
-
 int __rxe_add_index_locked(struct rxe_pool_elem *elem)
 {
 	struct rxe_pool *pool = elem->pool;
@@ -439,47 +360,3 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 
 	return obj;
 }
-
-void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key)
-{
-	struct rb_node *node;
-	struct rxe_pool_elem *elem;
-	void *obj;
-	int cmp;
-
-	node = pool->key.tree.rb_node;
-
-	while (node) {
-		elem = rb_entry(node, struct rxe_pool_elem, key_node);
-
-		cmp = memcmp((u8 *)elem + pool->key.key_offset,
-			     key, pool->key.key_size);
-
-		if (cmp > 0)
-			node = node->rb_left;
-		else if (cmp < 0)
-			node = node->rb_right;
-		else
-			break;
-	}
-
-	if (node) {
-		kref_get(&elem->ref_cnt);
-		obj = elem->obj;
-	} else {
-		obj = NULL;
-	}
-
-	return obj;
-}
-
-void *rxe_pool_get_key(struct rxe_pool *pool, void *key)
-{
-	void *obj;
-
-	read_lock_bh(&pool->pool_lock);
-	obj = rxe_pool_get_key_locked(pool, key);
-	read_unlock_bh(&pool->pool_lock);
-
-	return obj;
-}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 511f81554fd1..b6de415e10d2 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -9,7 +9,6 @@
 
 enum rxe_pool_flags {
 	RXE_POOL_INDEX		= BIT(1),
-	RXE_POOL_KEY		= BIT(2),
 	RXE_POOL_NO_ALLOC	= BIT(4),
 };
 
@@ -32,9 +31,6 @@ struct rxe_pool_elem {
 	struct kref		ref_cnt;
 	struct list_head	list;
 
-	/* only used if keyed */
-	struct rb_node		key_node;
-
 	/* only used if indexed */
 	struct rb_node		index_node;
 	u32			index;
@@ -61,13 +57,6 @@ struct rxe_pool {
 		u32			max_index;
 		u32			min_index;
 	} index;
-
-	/* only used if keyed */
-	struct {
-		struct rb_root		tree;
-		size_t			key_offset;
-		size_t			key_size;
-	} key;
 };
 
 /* initialize a pool of objects with given limit on
@@ -112,26 +101,6 @@ void __rxe_drop_index(struct rxe_pool_elem *elem);
 
 #define rxe_drop_index(obj) __rxe_drop_index(&(obj)->elem)
 
-/* assign a key to a keyed object and insert object into
- * pool's rb tree holding and not holding pool_lock
- */
-int __rxe_add_key_locked(struct rxe_pool_elem *elem, void *key);
-
-#define rxe_add_key_locked(obj, key) __rxe_add_key_locked(&(obj)->elem, key)
-
-int __rxe_add_key(struct rxe_pool_elem *elem, void *key);
-
-#define rxe_add_key(obj, key) __rxe_add_key(&(obj)->elem, key)
-
-/* remove elem from rb tree holding and not holding the pool_lock */
-void __rxe_drop_key_locked(struct rxe_pool_elem *elem);
-
-#define rxe_drop_key_locked(obj) __rxe_drop_key_locked(&(obj)->elem)
-
-void __rxe_drop_key(struct rxe_pool_elem *elem);
-
-#define rxe_drop_key(obj) __rxe_drop_key(&(obj)->elem)
-
 /* lookup an indexed object from index holding and not holding the pool_lock.
  * takes a reference on object
  */
@@ -139,13 +108,6 @@ void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index);
 
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index);
 
-/* lookup keyed object from key holding and not holding the pool_lock.
- * takes a reference on the objecti
- */
-void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key);
-
-void *rxe_pool_get_key(struct rxe_pool *pool, void *key);
-
 /* cleanup an object when all references are dropped */
 void rxe_elem_release(struct kref *kref);
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 14/17] RDMA/rxe: Remove mcg from rxe pools
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (12 preceding siblings ...)
  2022-01-31 22:08 ` [PATCH for-next v10 13/17] RDMA/rxe: Remove key'ed object support Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 15/17] RDMA/rxe: Add code to cleanup mcast memory Bob Pearson
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Finish removing mcg from rxe pools. Replace rxe pools ref counting by
kref's. Replace rxe_alloc by kzalloc.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |  8 ---
 drivers/infiniband/sw/rxe/rxe_loc.h   |  1 +
 drivers/infiniband/sw/rxe/rxe_mcast.c | 91 ++++++++++++++++-----------
 drivers/infiniband/sw/rxe/rxe_pool.c  |  5 --
 drivers/infiniband/sw/rxe/rxe_pool.h  |  1 -
 drivers/infiniband/sw/rxe/rxe_recv.c  |  4 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h |  4 +-
 7 files changed, 59 insertions(+), 55 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 310e184ae9e8..c560d467a972 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -28,7 +28,6 @@ void rxe_dealloc(struct ib_device *ib_dev)
 	rxe_pool_cleanup(&rxe->cq_pool);
 	rxe_pool_cleanup(&rxe->mr_pool);
 	rxe_pool_cleanup(&rxe->mw_pool);
-	rxe_pool_cleanup(&rxe->mc_grp_pool);
 
 	if (rxe->tfm)
 		crypto_free_shash(rxe->tfm);
@@ -157,15 +156,8 @@ static int rxe_init_pools(struct rxe_dev *rxe)
 	if (err)
 		goto err8;
 
-	err = rxe_pool_init(rxe, &rxe->mc_grp_pool, RXE_TYPE_MC_GRP,
-			    rxe->attr.max_mcast_grp);
-	if (err)
-		goto err9;
-
 	return 0;
 
-err9:
-	rxe_pool_cleanup(&rxe->mw_pool);
 err8:
 	rxe_pool_cleanup(&rxe->mr_pool);
 err7:
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index bd701af7758c..409efeecd581 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -43,6 +43,7 @@ void rxe_cq_cleanup(struct rxe_pool_elem *arg);
 struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid);
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
+void rxe_cleanup_mcg(struct kref *kref);
 
 /* rxe_mmap.c */
 struct rxe_mmap_info {
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 82669d14d8a9..ed23d0a270fd 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -98,7 +98,7 @@ static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe,
 	}
 
 	if (node) {
-		rxe_add_ref(mcg);
+		kref_get(&mcg->ref_cnt);
 		return mcg;
 	}
 
@@ -141,11 +141,13 @@ static int __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 	if (unlikely(err))
 		return err;
 
+	kref_init(&mcg->ref_cnt);
 	memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid));
 	INIT_LIST_HEAD(&mcg->qp_list);
 	mcg->rxe = rxe;
+	mcg->index = rxe->mcg_next++;
 
-	rxe_add_ref(mcg);
+	kref_get(&mcg->ref_cnt);
 	__rxe_insert_mcg(mcg);
 
 
@@ -163,7 +165,6 @@ static int __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 static int rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 		       struct rxe_mcg **mcgp)
 {
-	struct rxe_pool *pool = &rxe->mc_grp_pool;
 	struct rxe_mcg *mcg, *tmp;
 	int err;
 
@@ -178,7 +179,7 @@ static int rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 	}
 
 	/* speculative alloc of mcg */
-	mcg = rxe_alloc(pool);
+	mcg = kzalloc(sizeof(*mcg), GFP_KERNEL);
 	if (!mcg)
 		return -ENOMEM;
 
@@ -186,7 +187,7 @@ static int rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 	/* re-check to see if someone else just added it */
 	tmp = __rxe_lookup_mcg(rxe, mgid);
 	if (tmp) {
-		rxe_drop_ref(mcg);
+		kfree(mcg);
 		mcg = tmp;
 		goto out;
 	}
@@ -206,10 +207,53 @@ static int rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 err_dec:
 	atomic_dec(&rxe->mcg_num);
 	spin_unlock_bh(&rxe->mcg_lock);
-	rxe_drop_ref(mcg);
+	kfree(mcg);
 	return err;
 }
 
+/**
+ * rxe_cleanup_mcg - cleanup mcg for kref_put
+ * @kref:
+ *
+ * caller may or may not hold rxe->mcg_lock
+ */
+void rxe_cleanup_mcg(struct kref *kref)
+{
+	struct rxe_mcg *mcg = container_of(kref, typeof(*mcg), ref_cnt);
+
+	kfree(mcg);
+}
+
+/**
+ * __rxe_destroy_mcg - destroy mcg object holding rxe->mcg_lock
+ * @mcg: the mcg object
+ *
+ * Context: caller is holding rxe->mcg_lock, no qp's are attached to mcg
+ */
+void __rxe_destroy_mcg(struct rxe_mcg *mcg)
+{
+	struct rxe_dev *rxe = mcg->rxe;
+
+	__rxe_remove_mcg(mcg);
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
+
+	rxe_mcast_delete(rxe, &mcg->mgid);
+	atomic_dec(&rxe->mcg_num);
+}
+
+/**
+ * rxe_destroy_mcg - destroy mcg object
+ * @mcg: the mcg object
+ *
+ * Context: no qp's are attached to mcg
+ */
+static void rxe_destroy_mcg(struct rxe_mcg *mcg)
+{
+	spin_lock_bh(&mcg->rxe->mcg_lock);
+	__rxe_destroy_mcg(mcg);
+	spin_unlock_bh(&mcg->rxe->mcg_lock);
+}
+
 static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			   struct rxe_mcg *mcg)
 {
@@ -259,35 +303,6 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	return err;
 }
 
-/**
- * __rxe_destroy_mcg - destroy mcg object holding rxe->mcg_lock
- * @mcg: the mcg object
- *
- * Context: caller is holding rxe->mcg_lock, all refs to mcg are dropped
- * no qp's are attached to mcg
- */
-void __rxe_destroy_mcg(struct rxe_mcg *mcg)
-{
-	__rxe_remove_mcg(mcg);
-
-	rxe_drop_ref(mcg);
-
-	rxe_mcast_delete(mcg->rxe, &mcg->mgid);
-}
-
-/**
- * rxe_destroy_mcg - destroy mcg object
- * @mcg: the mcg object
- *
- * Context: all refs to mcg are dropped, no qp's are attached to mcg
- */
-static void rxe_destroy_mcg(struct rxe_mcg *mcg)
-{
-	spin_lock_bh(&mcg->rxe->mcg_lock);
-	__rxe_destroy_mcg(mcg);
-	spin_unlock_bh(&mcg->rxe->mcg_lock);
-}
-
 static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 				   union ib_gid *mgid)
 {
@@ -308,14 +323,14 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			atomic_dec(&qp->mcg_num);
 
 			spin_unlock_bh(&rxe->mcg_lock);
-			rxe_drop_ref(mcg);
+			kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 			kfree(mca);
 			return 0;
 		}
 	}
 
 	spin_unlock_bh(&rxe->mcg_lock);
-	rxe_drop_ref(mcg);
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 err1:
 	return -EINVAL;
 }
@@ -336,7 +351,7 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	if (atomic_read(&mcg->qp_num) == 0)
 		rxe_destroy_mcg(mcg);
 
-	rxe_drop_ref(mcg);
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 	return err;
 }
 
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index fe0fcca47d8d..b6fe7c93aaab 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -79,11 +79,6 @@ static const struct rxe_type_info {
 		.min_index	= RXE_MIN_MW_INDEX,
 		.max_index	= RXE_MAX_MW_INDEX,
 	},
-	[RXE_TYPE_MC_GRP] = {
-		.name		= "rxe-mcg",
-		.size		= sizeof(struct rxe_mcg),
-		.elem_offset	= offsetof(struct rxe_mcg, elem),
-	},
 };
 
 static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min)
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index b6de415e10d2..99b1eb04b405 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -21,7 +21,6 @@ enum rxe_elem_type {
 	RXE_TYPE_CQ,
 	RXE_TYPE_MR,
 	RXE_TYPE_MW,
-	RXE_TYPE_MC_GRP,
 	RXE_NUM_TYPES,		/* keep me last */
 };
 
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 9a92e5a486ee..04fe0cd36d6c 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -275,6 +275,8 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 			break;
 	}
 	spin_unlock_bh(&rxe->mcg_lock);
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
+
 	nmax = n;
 
 	/* this is unreliable datagram service so we let
@@ -320,8 +322,6 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 
 	kfree(qp_array);
 
-	rxe_drop_ref(mcg);
-
 	if (likely(!skb))
 		return;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index ea2d9ff29744..97d3a59e5c6f 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -352,12 +352,13 @@ struct rxe_mw {
 };
 
 struct rxe_mcg {
-	struct rxe_pool_elem	elem;
 	struct rb_node		node;
+	struct kref		ref_cnt;
 	struct rxe_dev		*rxe;
 	struct list_head	qp_list;
 	atomic_t		qp_num;
 	union ib_gid		mgid;
+	unsigned int		index;
 	u32			qkey;
 	u16			pkey;
 };
@@ -400,6 +401,7 @@ struct rxe_dev {
 	spinlock_t		mcg_lock; /* guard multicast groups */
 	struct rb_root		mcg_tree;
 	atomic_t		mcg_num;
+	unsigned int		mcg_next;
 
 	spinlock_t		pending_lock; /* guard pending_mmaps */
 	struct list_head	pending_mmaps;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 15/17] RDMA/rxe: Add code to cleanup mcast memory
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (13 preceding siblings ...)
  2022-01-31 22:08 ` [PATCH for-next v10 14/17] RDMA/rxe: Remove mcg from rxe pools Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 16/17] RDMA/rxe: Add comments to rxe_mcast.c Bob Pearson
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Well behaved applications will free all memory allocated by multicast
but programs which do not clean up properly can leave behind allocated
memory when the rxe driver is unloaded. This patch walks the red-black
tree holding multicast group elements and then walks the list of attached
qp's freeing the mca's and finally the mcg's.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |  2 ++
 drivers/infiniband/sw/rxe/rxe_loc.h   |  1 +
 drivers/infiniband/sw/rxe/rxe_mcast.c | 31 +++++++++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index c560d467a972..74c5521e9b3d 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -29,6 +29,8 @@ void rxe_dealloc(struct ib_device *ib_dev)
 	rxe_pool_cleanup(&rxe->mr_pool);
 	rxe_pool_cleanup(&rxe->mw_pool);
 
+	rxe_cleanup_mcast(rxe);
+
 	if (rxe->tfm)
 		crypto_free_shash(rxe->tfm);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 409efeecd581..0bc1b7e2877c 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -44,6 +44,7 @@ struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid);
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 void rxe_cleanup_mcg(struct kref *kref);
+void rxe_cleanup_mcast(struct rxe_dev *rxe);
 
 /* rxe_mmap.c */
 struct rxe_mmap_info {
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index ed23d0a270fd..00b4e3046d39 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -362,3 +362,34 @@ int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 
 	return rxe_mcast_drop_grp_elem(rxe, qp, mgid);
 }
+
+/**
+ * rxe_cleanup_mcast - cleanup all resources held by mcast
+ * @rxe: rxe object
+ *
+ * Called when rxe device is unloaded. Walk red-black tree to
+ * find all mcg's and then walk mcg->qp_list to find all mca's and
+ * free them. These should have been freed already if apps are
+ * well behaved.
+ */
+void rxe_cleanup_mcast(struct rxe_dev *rxe)
+{
+	struct rb_root *root = &rxe->mcg_tree;
+	struct rb_node *node, *next;
+	struct rxe_mcg *mcg;
+	struct rxe_mca *mca, *tmp;
+
+	for (node = rb_first(root); node; node = next) {
+		next = rb_next(node);
+		mcg = rb_entry(node, typeof(*mcg), node);
+
+		spin_lock_bh(&rxe->mcg_lock);
+		list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list)
+			kfree(mca);
+
+		__rxe_remove_mcg(mcg);
+		spin_unlock_bh(&rxe->mcg_lock);
+
+		kfree(mcg);
+	}
+}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 16/17] RDMA/rxe: Add comments to rxe_mcast.c
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (14 preceding siblings ...)
  2022-01-31 22:08 ` [PATCH for-next v10 15/17] RDMA/rxe: Add code to cleanup mcast memory Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-01-31 22:08 ` [PATCH for-next v10 17/17] RDMA/rxe: Finish cleanup of rxe_mcast.c Bob Pearson
  2022-02-01 14:36 ` [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Jason Gunthorpe
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Add comments to rxe_mcast.c.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 30 ++++++++++++++++++++++++++-
 drivers/infiniband/sw/rxe/rxe_verbs.h |  2 +-
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 00b4e3046d39..2fccf69f9a4b 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -1,12 +1,33 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 /*
+ * Copyright (c) 2022 Hewlett Packard Enterprise, Inc. All rights reserved.
  * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
  * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
  */
 
+/*
+ * rxe_mcast.c implements driver support for multicast transport.
+ * It is based on two data structures struct rxe_mcg ('mcg') and
+ * struct rxe_mca ('mca'). An mcg is allocated each time a qp is
+ * attached to a new mgid for the first time. These are indexed by
+ * a red-black tree using the mgid. This data structure is searched
+ * for the mcg when a multicast packet is received and when another
+ * qp is attached to the same mgid. It is cleaned up when the last qp
+ * is detached from the mcg. Each time a qp is attached to an mcg an
+ * mca is created. It holds a pointer to the qp and is added to a list
+ * of qp's that are attached to the mcg. The qp_list is used to replicate
+ * mcast packets in the rxe receive path.
+ */
+
 #include "rxe.h"
-#include "rxe_loc.h"
 
+/**
+ * rxe_mcast_add - add multicast address to rxe device
+ * @rxe: rxe device object
+ * @mgid: multicast address as a gid
+ *
+ * Returns 0 on success else an error
+ */
 static int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
 {
 	unsigned char ll_addr[ETH_ALEN];
@@ -16,6 +37,13 @@ static int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
 	return dev_mc_add(rxe->ndev, ll_addr);
 }
 
+/**
+ * rxe_mcast_delete - delete multicast address from rxe device
+ * @rxe: rxe device object
+ * @mgid: multicast address as a gid
+ *
+ * Returns 0 on success else an error
+ */
 static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
 {
 	unsigned char ll_addr[ETH_ALEN];
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 97d3a59e5c6f..72a913a8e0cb 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -358,7 +358,7 @@ struct rxe_mcg {
 	struct list_head	qp_list;
 	atomic_t		qp_num;
 	union ib_gid		mgid;
-	unsigned int		index;
+	unsigned int		index; /* debugging help */
 	u32			qkey;
 	u16			pkey;
 };
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH for-next v10 17/17] RDMA/rxe: Finish cleanup of rxe_mcast.c
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (15 preceding siblings ...)
  2022-01-31 22:08 ` [PATCH for-next v10 16/17] RDMA/rxe: Add comments to rxe_mcast.c Bob Pearson
@ 2022-01-31 22:08 ` Bob Pearson
  2022-02-01 14:36 ` [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Jason Gunthorpe
  17 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-01-31 22:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Cleanup rxe_mcast.c code. Minor changes and complete comments.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 163 +++++++++++++++++++-------
 drivers/infiniband/sw/rxe/rxe_verbs.h |   1 +
 2 files changed, 124 insertions(+), 40 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 2fccf69f9a4b..2e5b41063f83 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -175,6 +175,7 @@ static int __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 	mcg->rxe = rxe;
 	mcg->index = rxe->mcg_next++;
 
+	/* take reference to protect pointer in red-black tree */
 	kref_get(&mcg->ref_cnt);
 	__rxe_insert_mcg(mcg);
 
@@ -263,6 +264,7 @@ void __rxe_destroy_mcg(struct rxe_mcg *mcg)
 	struct rxe_dev *rxe = mcg->rxe;
 
 	__rxe_remove_mcg(mcg);
+	/* drop reference that protected pointer in red-black tree */
 	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 
 	rxe_mcast_delete(rxe, &mcg->mgid);
@@ -282,11 +284,59 @@ static void rxe_destroy_mcg(struct rxe_mcg *mcg)
 	spin_unlock_bh(&mcg->rxe->mcg_lock);
 }
 
-static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-			   struct rxe_mcg *mcg)
+/**
+ * __rxe_init_mca - initialize a new mca holding lock
+ * @qp: qp object
+ * @mcg: mcg object
+ * @mca: empty space for new mca
+ *
+ * Context: caller must hold references on qp and mcg, rxe->mcg_lock
+ * and pass memory for new mca
+ *
+ * Returns: 0 on success else an error
+ */
+static int __rxe_init_mca(struct rxe_qp *qp, struct rxe_mcg *mcg,
+			  struct rxe_mca *mca)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	int n;
+
+	n = atomic_inc_return(&rxe->mcg_attach);
+	if (n > rxe->attr.max_total_mcast_qp_attach) {
+		atomic_dec(&rxe->mcg_attach);
+		return -ENOMEM;
+	}
+
+	n = atomic_inc_return(&mcg->qp_num);
+	if (n > rxe->attr.max_mcast_qp_attach) {
+		atomic_dec(&mcg->qp_num);
+		atomic_dec(&rxe->mcg_attach);
+		return -ENOMEM;
+	}
+
+	atomic_inc(&qp->mcg_num);
+
+	rxe_add_ref(qp);
+	mca->qp = qp;
+
+	list_add_tail(&mca->qp_list, &mcg->qp_list);
+
+	return 0;
+}
+
+/**
+ * rxe_attach_mcg - attach qp to mcg if not already attached
+ * @mcg: mcg object
+ * @qp: qp object
+ *
+ * Context: caller must hold reference on qp and mcg.
+ * Returns: 0 on success else an error
+ */
+static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 {
+	struct rxe_dev *rxe = mcg->rxe;
+	struct rxe_mca *mca, *tmp;
 	int err;
-	struct rxe_mca *mca, *new_mca;
 
 	/* check to see if the qp is already a member of the group */
 	spin_lock_bh(&rxe->mcg_lock);
@@ -298,71 +348,84 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	}
 	spin_unlock_bh(&rxe->mcg_lock);
 
-	/* speculative alloc new mca without using GFP_ATOMIC */
-	new_mca = kzalloc(sizeof(*mca), GFP_KERNEL);
-	if (!new_mca)
+	/* speculative alloc new mca */
+	mca = kzalloc(sizeof(*mca), GFP_KERNEL);
+	if (!mca)
 		return -ENOMEM;
 
 	spin_lock_bh(&rxe->mcg_lock);
 	/* re-check to see if someone else just attached qp */
-	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
+	list_for_each_entry(tmp, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
-			kfree(new_mca);
+			kfree(mca);
 			err = 0;
-			goto out;
+			goto done;
 		}
 	}
-	mca = new_mca;
 
-	if (atomic_read(&mcg->qp_num) >= rxe->attr.max_mcast_qp_attach) {
-		err = -ENOMEM;
-		goto out;
-	}
+	err = __rxe_init_mca(qp, mcg, mca);
+	if (err)
+		kfree(mca);
+done:
+	spin_unlock_bh(&rxe->mcg_lock);
 
-	atomic_inc(&mcg->qp_num);
-	mca->qp = qp;
-	atomic_inc(&qp->mcg_num);
+	return err;
+}
 
-	list_add_tail(&mca->qp_list, &mcg->qp_list);
+/**
+ * __rxe_cleanup_mca - cleanup mca object holding lock
+ * @mca: mca object
+ * @mcg: mcg object
+ *
+ * Context: caller must hold a reference to mcg and rxe->mcg_lock
+ */
+static void __rxe_cleanup_mca(struct rxe_mca *mca, struct rxe_mcg *mcg)
+{
+	list_del(&mca->qp_list);
 
-	err = 0;
-out:
-	spin_unlock_bh(&rxe->mcg_lock);
-	return err;
+	rxe_drop_ref(mca->qp);
+
+	atomic_dec(&mcg->qp_num);
+	atomic_dec(&mcg->rxe->mcg_attach);
+	atomic_dec(&mca->qp->mcg_num);
 }
 
-static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-				   union ib_gid *mgid)
+/**
+ * rxe_detach_mcg - detach qp from mcg
+ * @mcg: mcg object
+ * @qp: qp object
+ *
+ * Returns: 0 on success else an error if qp is not attached.
+ */
+static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 {
-	struct rxe_mcg *mcg;
+	struct rxe_dev *rxe = mcg->rxe;
 	struct rxe_mca *mca, *tmp;
 
-	mcg = rxe_lookup_mcg(rxe, mgid);
-	if (!mcg)
-		goto err1;
-
 	spin_lock_bh(&rxe->mcg_lock);
-
 	list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
-			list_del(&mca->qp_list);
-			if (atomic_dec_return(&mcg->qp_num) <= 0)
+			__rxe_cleanup_mca(mca, mcg);
+			if (atomic_read(&mcg->qp_num) <= 0)
 				__rxe_destroy_mcg(mcg);
-			atomic_dec(&qp->mcg_num);
-
 			spin_unlock_bh(&rxe->mcg_lock);
-			kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 			kfree(mca);
 			return 0;
 		}
 	}
-
 	spin_unlock_bh(&rxe->mcg_lock);
-	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
-err1:
+
 	return -EINVAL;
 }
 
+/**
+ * rxe_attach_mcast - attach qp to multicast group (see IBA-11.3.1)
+ * @ibqp: (IB) qp object
+ * @mgid: multicast IP address
+ * @mlid: multicast LID, ignored for RoCEv2 (see IBA-A17.5.6)
+ *
+ * Returns: 0 on success else an errno
+ */
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 {
 	int err;
@@ -374,8 +437,11 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	if (err)
 		return err;
 
-	err = rxe_mcast_add_grp_elem(rxe, qp, mcg);
+	err = rxe_attach_mcg(mcg, qp);
 
+	/* this can happen if we failed to attach a first qp to mcg
+	 * go ahead and destroy mcg
+	 */
 	if (atomic_read(&mcg->qp_num) == 0)
 		rxe_destroy_mcg(mcg);
 
@@ -383,12 +449,29 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	return err;
 }
 
+/**
+ * rxe_detach_mcast - detach qp from multicast group (see IBA-11.3.2)
+ * @ibqp: address of (IB) qp object
+ * @mgid: multicast IP address
+ * @mlid: multicast LID, ignored for RoCEv2 (see IBA-A17.5.6)
+ *
+ * Returns: 0 on success else an errno
+ */
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 {
 	struct rxe_dev *rxe = to_rdev(ibqp->device);
 	struct rxe_qp *qp = to_rqp(ibqp);
+	struct rxe_mcg *mcg;
+	int err;
 
-	return rxe_mcast_drop_grp_elem(rxe, qp, mgid);
+	mcg = rxe_lookup_mcg(rxe, mgid);
+	if (!mcg)
+		return -EINVAL;
+
+	err = rxe_detach_mcg(mcg, qp);
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
+
+	return err;
 }
 
 /**
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 72a913a8e0cb..716f11ec80fe 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -401,6 +401,7 @@ struct rxe_dev {
 	spinlock_t		mcg_lock; /* guard multicast groups */
 	struct rb_root		mcg_tree;
 	atomic_t		mcg_num;
+	atomic_t		mcg_attach;
 	unsigned int		mcg_next;
 
 	spinlock_t		pending_lock; /* guard pending_mmaps */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH for-next v10 07/17] RDMA/rxe: Use kzmalloc/kfree for mca
  2022-01-31 22:08 ` [PATCH for-next v10 07/17] RDMA/rxe: Use kzmalloc/kfree for mca Bob Pearson
@ 2022-02-01  1:03   ` kernel test robot
  2022-02-01 14:53   ` Jason Gunthorpe
  2022-02-01 18:20   ` kernel test robot
  2 siblings, 0 replies; 25+ messages in thread
From: kernel test robot @ 2022-02-01  1:03 UTC (permalink / raw)
  To: Bob Pearson, jgg, zyjzyj2000, linux-rdma; +Cc: kbuild-all, Bob Pearson

Hi Bob,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v5.17-rc2]
[also build test WARNING on next-20220131]
[cannot apply to rdma/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Bob-Pearson/Move-two-object-pools-to-rxe_mcast-c/20220201-061231
base:    26291c54e111ff6ba87a164d85d4a4e134b7315c
config: ia64-allyesconfig (https://download.01.org/0day-ci/archive/20220201/202202010836.EmoG3Ot8-lkp@intel.com/config)
compiler: ia64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f9d560658bbbd5a17cc3c62e566cb9bb77697530
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Bob-Pearson/Move-two-object-pools-to-rxe_mcast-c/20220201-061231
        git checkout f9d560658bbbd5a17cc3c62e566cb9bb77697530
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=ia64 SHELL=/bin/bash drivers/infiniband/sw/rxe/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/infiniband/sw/rxe/rxe_mcast.c:57:6: warning: no previous prototype for '__rxe_destroy_mcg' [-Wmissing-prototypes]
      57 | void __rxe_destroy_mcg(struct rxe_mcg *grp)
         |      ^~~~~~~~~~~~~~~~~


vim +/__rxe_destroy_mcg +57 drivers/infiniband/sw/rxe/rxe_mcast.c

    55	
    56	/* caller is holding a ref from lookup and mcg->mcg_lock*/
  > 57	void __rxe_destroy_mcg(struct rxe_mcg *grp)
    58	{
    59		rxe_drop_key(grp);
    60		rxe_drop_ref(grp);
    61	
    62		rxe_mcast_delete(grp->rxe, &grp->mgid);
    63	}
    64	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c
  2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
                   ` (16 preceding siblings ...)
  2022-01-31 22:08 ` [PATCH for-next v10 17/17] RDMA/rxe: Finish cleanup of rxe_mcast.c Bob Pearson
@ 2022-02-01 14:36 ` Jason Gunthorpe
  17 siblings, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2022-02-01 14:36 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Mon, Jan 31, 2022 at 04:08:33PM -0600, Bob Pearson wrote:
> This patch series separates the mc_grp and mc_elem object pools from
> rxe_pools.c and moves their code to rxe_mcast.c. This makes sense because
> these two pools are different from the other pools as the only ones that
> do not share objects with rdma-core and that use key's instead of
> indices to enable looking up objects. This change will enable a significant
> simplification of the normal object pools.
> 
> Note: the independent implementation takes references to multicast groups
> and also to pointers stored in the red-black trees holding the keys. The
> reference handling code is moved to be adjacent to the code that manages
> the pointers.
> 
> This patch series applies cleanly to current for-next.
> commit e783362eb54cd99b2cac8b3a9aeac942e6f6ac07 (tag: v5.17-rc1,
> 		origin/wip/jgg-for-rc, origin/wip/jgg-for-next,
> 		origin/wip/for-testing, origin/for-rc,
> 		origin/for-next, origin/HEAD, for-next)

This is old, by the time I sent you the previos notes the branches
moved ahead to d3f6899b0b5617e8900d6b1ae60414e611b1a0f1 - and as I
said I already too many of these patches

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-next v10 07/17] RDMA/rxe: Use kzmalloc/kfree for mca
  2022-01-31 22:08 ` [PATCH for-next v10 07/17] RDMA/rxe: Use kzmalloc/kfree for mca Bob Pearson
  2022-02-01  1:03   ` kernel test robot
@ 2022-02-01 14:53   ` Jason Gunthorpe
  2022-02-01 20:00     ` Bob Pearson
  2022-02-01 18:20   ` kernel test robot
  2 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2022-02-01 14:53 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Mon, Jan 31, 2022 at 04:08:40PM -0600, Bob Pearson wrote:
> Remove rxe_mca (was rxe_mc_elem) from rxe pools and use kzmalloc
> and kfree to allocate and free. Use the sequence
> 
>     <lookup qp>
>     new_mca = kzalloc(sizeof(*new_mca), GFP_KERNEL);
>     <spin lock>
>     <lookup qp again> /* in case of a race */
>     <init new_mca>
>     <spin unlock>
> 
> instead of GFP_ATOMIC inside of the spinlock. Add an extra reference
> to multicast group to protect the pointer in the index that maps
> mgid to group.
> 
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>  drivers/infiniband/sw/rxe/rxe.c       |   8 --
>  drivers/infiniband/sw/rxe/rxe_mcast.c | 102 +++++++++++++++-----------
>  drivers/infiniband/sw/rxe/rxe_pool.c  |   5 --
>  drivers/infiniband/sw/rxe/rxe_pool.h  |   1 -
>  drivers/infiniband/sw/rxe/rxe_verbs.h |   2 -
>  5 files changed, 59 insertions(+), 59 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> index fab291245366..c55736e441e7 100644
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -29,7 +29,6 @@ void rxe_dealloc(struct ib_device *ib_dev)
>  	rxe_pool_cleanup(&rxe->mr_pool);
>  	rxe_pool_cleanup(&rxe->mw_pool);
>  	rxe_pool_cleanup(&rxe->mc_grp_pool);
> -	rxe_pool_cleanup(&rxe->mc_elem_pool);
>  
>  	if (rxe->tfm)
>  		crypto_free_shash(rxe->tfm);
> @@ -163,15 +162,8 @@ static int rxe_init_pools(struct rxe_dev *rxe)
>  	if (err)
>  		goto err9;
>  
> -	err = rxe_pool_init(rxe, &rxe->mc_elem_pool, RXE_TYPE_MC_ELEM,
> -			    rxe->attr.max_total_mcast_qp_attach);
> -	if (err)
> -		goto err10;
> -
>  	return 0;
>  
> -err10:
> -	rxe_pool_cleanup(&rxe->mc_grp_pool);
>  err9:
>  	rxe_pool_cleanup(&rxe->mw_pool);
>  err8:
> diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
> index 9336295c4ee2..4a5896a225a6 100644
> +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
> @@ -26,30 +26,40 @@ static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
>  }
>  
>  /* caller should hold mc_grp_pool->pool_lock */
> -static struct rxe_mcg *create_grp(struct rxe_dev *rxe,
> -				     struct rxe_pool *pool,
> -				     union ib_gid *mgid)
> +static int __rxe_create_grp(struct rxe_dev *rxe, struct rxe_pool *pool,
> +			    union ib_gid *mgid, struct rxe_mcg **grp_p)
>  {
>  	int err;
>  	struct rxe_mcg *grp;
>  
>  	grp = rxe_alloc_locked(&rxe->mc_grp_pool);
>  	if (!grp)
> -		return ERR_PTR(-ENOMEM);
> +		return -ENOMEM;
> +
> +	err = rxe_mcast_add(rxe, mgid);
> +	if (unlikely(err)) {
> +		rxe_drop_ref(grp);
> +		return err;
> +	}
>  
>  	INIT_LIST_HEAD(&grp->qp_list);
>  	spin_lock_init(&grp->mcg_lock);
>  	grp->rxe = rxe;
> +
> +	rxe_add_ref(grp);
>  	rxe_add_key_locked(grp, mgid);
>  
> -	err = rxe_mcast_add(rxe, mgid);
> -	if (unlikely(err)) {
> -		rxe_drop_key_locked(grp);
> -		rxe_drop_ref(grp);
> -		return ERR_PTR(err);
> -	}
> +	*grp_p = grp;

This should return the struct rxe_mcg or an ERR_PTR not use output
function arguments, like it was before

> +	return 0;
> +}
> +
> +/* caller is holding a ref from lookup and mcg->mcg_lock*/
> +void __rxe_destroy_mcg(struct rxe_mcg *grp)
> +{
> +	rxe_drop_key(grp);
> +	rxe_drop_ref(grp);
>  
> -	return grp;
> +	rxe_mcast_delete(grp->rxe, &grp->mgid);
>  }
>  
>  static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
> @@ -68,10 +78,9 @@ static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
>  	if (grp)
>  		goto done;
>  
> -	grp = create_grp(rxe, pool, mgid);
> -	if (IS_ERR(grp)) {
> +	err = __rxe_create_grp(rxe, pool, mgid, &grp);
> +	if (err) {
>  		write_unlock_bh(&pool->pool_lock);
> -		err = PTR_ERR(grp);
>  		return err;

This should return the struct rxe_mcg or ERR_PTR too

> @@ -126,7 +143,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
>  				   union ib_gid *mgid)
>  {
>  	struct rxe_mcg *grp;
> -	struct rxe_mca *elem, *tmp;
> +	struct rxe_mca *mca, *tmp;
>  
>  	grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
>  	if (!grp)
> @@ -134,33 +151,30 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
>  
>  	spin_lock_bh(&grp->mcg_lock);
>  
> +	list_for_each_entry_safe(mca, tmp, &grp->qp_list, qp_list) {
> +		if (mca->qp == qp) {
> +			list_del(&mca->qp_list);
>  			grp->num_qp--;
> +			if (grp->num_qp <= 0)
> +				__rxe_destroy_mcg(grp);
>  			atomic_dec(&qp->mcg_num);
>  
>  			spin_unlock_bh(&grp->mcg_lock);
> +			rxe_drop_ref(grp);

Wwhere did this extra ref come from?

The grp was threaded on a linked list by rxe_attach_mcast, but that
also dropped the extra reference.

The reference should have followed the pointer into the linked list,
then it could be unput here when unthreading it from the linked list.

To me this still looks like there should be exactly one ref, the
rbtree should be removed inside the release function when the ref goes
to zero (under the pool_lock) and doesn't otherwise hold a ref

The linked list on the mca should hold the only ref and when
all the linked list is put back the grp can be released, no need for
qp_num as well.

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-next v10 07/17] RDMA/rxe: Use kzmalloc/kfree for mca
  2022-01-31 22:08 ` [PATCH for-next v10 07/17] RDMA/rxe: Use kzmalloc/kfree for mca Bob Pearson
  2022-02-01  1:03   ` kernel test robot
  2022-02-01 14:53   ` Jason Gunthorpe
@ 2022-02-01 18:20   ` kernel test robot
  2 siblings, 0 replies; 25+ messages in thread
From: kernel test robot @ 2022-02-01 18:20 UTC (permalink / raw)
  To: Bob Pearson, jgg, zyjzyj2000, linux-rdma; +Cc: llvm, kbuild-all, Bob Pearson

Hi Bob,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v5.17-rc2]
[cannot apply to rdma/for-next next-20220201]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Bob-Pearson/Move-two-object-pools-to-rxe_mcast-c/20220201-061231
base:    26291c54e111ff6ba87a164d85d4a4e134b7315c
config: arm-randconfig-r001-20220131 (https://download.01.org/0day-ci/archive/20220202/202202020207.bxuWR3xX-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 6b1e844b69f15bb7dffaf9365cd2b355d2eb7579)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/0day-ci/linux/commit/f9d560658bbbd5a17cc3c62e566cb9bb77697530
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Bob-Pearson/Move-two-object-pools-to-rxe_mcast-c/20220201-061231
        git checkout f9d560658bbbd5a17cc3c62e566cb9bb77697530
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash drivers/infiniband/sw/rxe/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/infiniband/sw/rxe/rxe_mcast.c:57:6: warning: no previous prototype for function '__rxe_destroy_mcg' [-Wmissing-prototypes]
   void __rxe_destroy_mcg(struct rxe_mcg *grp)
        ^
   drivers/infiniband/sw/rxe/rxe_mcast.c:57:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void __rxe_destroy_mcg(struct rxe_mcg *grp)
   ^
   static 
   1 warning generated.


vim +/__rxe_destroy_mcg +57 drivers/infiniband/sw/rxe/rxe_mcast.c

    55	
    56	/* caller is holding a ref from lookup and mcg->mcg_lock*/
  > 57	void __rxe_destroy_mcg(struct rxe_mcg *grp)
    58	{
    59		rxe_drop_key(grp);
    60		rxe_drop_ref(grp);
    61	
    62		rxe_mcast_delete(grp->rxe, &grp->mgid);
    63	}
    64	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-next v10 07/17] RDMA/rxe: Use kzmalloc/kfree for mca
  2022-02-01 14:53   ` Jason Gunthorpe
@ 2022-02-01 20:00     ` Bob Pearson
  2022-02-01 20:14       ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Bob Pearson @ 2022-02-01 20:00 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 2/1/22 08:53, Jason Gunthorpe wrote:
> On Mon, Jan 31, 2022 at 04:08:40PM -0600, Bob Pearson wrote:
>> Remove rxe_mca (was rxe_mc_elem) from rxe pools and use kzmalloc
>> and kfree to allocate and free. Use the sequence
>>
>>     <lookup qp>
>>     new_mca = kzalloc(sizeof(*new_mca), GFP_KERNEL);
>>     <spin lock>
>>     <lookup qp again> /* in case of a race */
>>     <init new_mca>
>>     <spin unlock>
>>
>> instead of GFP_ATOMIC inside of the spinlock. Add an extra reference
>> to multicast group to protect the pointer in the index that maps
>> mgid to group.
>>
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>  drivers/infiniband/sw/rxe/rxe.c       |   8 --
>>  drivers/infiniband/sw/rxe/rxe_mcast.c | 102 +++++++++++++++-----------
>>  drivers/infiniband/sw/rxe/rxe_pool.c  |   5 --
>>  drivers/infiniband/sw/rxe/rxe_pool.h  |   1 -
>>  drivers/infiniband/sw/rxe/rxe_verbs.h |   2 -
>>  5 files changed, 59 insertions(+), 59 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
>> index fab291245366..c55736e441e7 100644
>> +++ b/drivers/infiniband/sw/rxe/rxe.c
>> @@ -29,7 +29,6 @@ void rxe_dealloc(struct ib_device *ib_dev)
>>  	rxe_pool_cleanup(&rxe->mr_pool);
>>  	rxe_pool_cleanup(&rxe->mw_pool);
>>  	rxe_pool_cleanup(&rxe->mc_grp_pool);
>> -	rxe_pool_cleanup(&rxe->mc_elem_pool);
>>  
>>  	if (rxe->tfm)
>>  		crypto_free_shash(rxe->tfm);
>> @@ -163,15 +162,8 @@ static int rxe_init_pools(struct rxe_dev *rxe)
>>  	if (err)
>>  		goto err9;
>>  
>> -	err = rxe_pool_init(rxe, &rxe->mc_elem_pool, RXE_TYPE_MC_ELEM,
>> -			    rxe->attr.max_total_mcast_qp_attach);
>> -	if (err)
>> -		goto err10;
>> -
>>  	return 0;
>>  
>> -err10:
>> -	rxe_pool_cleanup(&rxe->mc_grp_pool);
>>  err9:
>>  	rxe_pool_cleanup(&rxe->mw_pool);
>>  err8:
>> diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
>> index 9336295c4ee2..4a5896a225a6 100644
>> +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
>> @@ -26,30 +26,40 @@ static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
>>  }
>>  
>>  /* caller should hold mc_grp_pool->pool_lock */
>> -static struct rxe_mcg *create_grp(struct rxe_dev *rxe,
>> -				     struct rxe_pool *pool,
>> -				     union ib_gid *mgid)
>> +static int __rxe_create_grp(struct rxe_dev *rxe, struct rxe_pool *pool,
>> +			    union ib_gid *mgid, struct rxe_mcg **grp_p)
>>  {
>>  	int err;
>>  	struct rxe_mcg *grp;
>>  
>>  	grp = rxe_alloc_locked(&rxe->mc_grp_pool);
>>  	if (!grp)
>> -		return ERR_PTR(-ENOMEM);
>> +		return -ENOMEM;
>> +
>> +	err = rxe_mcast_add(rxe, mgid);
>> +	if (unlikely(err)) {
>> +		rxe_drop_ref(grp);
>> +		return err;
>> +	}
>>  
>>  	INIT_LIST_HEAD(&grp->qp_list);
>>  	spin_lock_init(&grp->mcg_lock);
>>  	grp->rxe = rxe;
>> +
>> +	rxe_add_ref(grp);
>>  	rxe_add_key_locked(grp, mgid);
>>  
>> -	err = rxe_mcast_add(rxe, mgid);
>> -	if (unlikely(err)) {
>> -		rxe_drop_key_locked(grp);
>> -		rxe_drop_ref(grp);
>> -		return ERR_PTR(err);
>> -	}
>> +	*grp_p = grp;
> 
> This should return the struct rxe_mcg or an ERR_PTR not use output
> function arguments, like it was before
> 
>> +	return 0;
>> +}
>> +
>> +/* caller is holding a ref from lookup and mcg->mcg_lock*/
>> +void __rxe_destroy_mcg(struct rxe_mcg *grp)
>> +{
>> +	rxe_drop_key(grp);
>> +	rxe_drop_ref(grp);
>>  
>> -	return grp;
>> +	rxe_mcast_delete(grp->rxe, &grp->mgid);
>>  }
>>  
>>  static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
>> @@ -68,10 +78,9 @@ static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
>>  	if (grp)
>>  		goto done;
>>  
>> -	grp = create_grp(rxe, pool, mgid);
>> -	if (IS_ERR(grp)) {
>> +	err = __rxe_create_grp(rxe, pool, mgid, &grp);
>> +	if (err) {
>>  		write_unlock_bh(&pool->pool_lock);
>> -		err = PTR_ERR(grp);
>>  		return err;
> 
> This should return the struct rxe_mcg or ERR_PTR too
> 
>> @@ -126,7 +143,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
>>  				   union ib_gid *mgid)
>>  {
>>  	struct rxe_mcg *grp;
>> -	struct rxe_mca *elem, *tmp;
>> +	struct rxe_mca *mca, *tmp;
>>  
>>  	grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
>>  	if (!grp)
>> @@ -134,33 +151,30 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
>>  
>>  	spin_lock_bh(&grp->mcg_lock);
>>  
>> +	list_for_each_entry_safe(mca, tmp, &grp->qp_list, qp_list) {
>> +		if (mca->qp == qp) {
>> +			list_del(&mca->qp_list);
>>  			grp->num_qp--;
>> +			if (grp->num_qp <= 0)
>> +				__rxe_destroy_mcg(grp);
>>  			atomic_dec(&qp->mcg_num);
>>  
>>  			spin_unlock_bh(&grp->mcg_lock);
>> +			rxe_drop_ref(grp);
> 
> Wwhere did this extra ref come from?
> 
> The grp was threaded on a linked list by rxe_attach_mcast, but that
> also dropped the extra reference.
> 
> The reference should have followed the pointer into the linked list,
> then it could be unput here when unthreading it from the linked list.
> 
> To me this still looks like there should be exactly one ref, the
> rbtree should be removed inside the release function when the ref goes
> to zero (under the pool_lock) and doesn't otherwise hold a ref
> 
> The linked list on the mca should hold the only ref and when
> all the linked list is put back the grp can be released, no need for
> qp_num as well.
> 
> Jason

Here is a rough picture of what is going on. When not active there are pointers to
each mcg in the red-black tree (from a parent of the rb_node contained in mcg) and
if there are attached qp's there are a pair of pointers from the head and tail of the
linked list else none. When active in either verbs or packet processing code there
is a pointer to mcg held in a local variable by the routine obtained from looking
up the mcg in the tree.

                                  +-----+
                                  | rxe |
                                  +-----+
                                     |
                                   +-v-+
                           +-------|rb |-------+
                           |       +---+       |
                           |                 +-v-+
                           |                 |rb |
                        +--v--+              +---+
                        |(rb) |
       +--------------->| mcg |<---------------+
       |                +-----+                |
       |                                       |
       |    +-----+     +-----+     +-----+    |
       +--->| mca |<--->| mca |<--->| mca |<---+
            +-----+     +-----+     +-----+
               |           |           |
               |           |           |
            +--v--+     +--v--+     +--v--+
            | qp  |     | qp  |     | qp  |
            +-----+     +-----+     +-----+

as currently written the local variable has a kref obtained from the kref_get in
rxe_lookup_mcg or the kref_init in rxe_init_mcg if it is newly created. This ref is
dropped when the local variable goes out of scope. To protect the mcg when it is
inactive at least one more ref is required. I take an additional ref in rxe_get_mcg
if the mcg is created to protect the pointer in the red-black tree. This persists
for the lifetime of the object until it is destroyed when it is removed from the tree
and has the longest scope. This is enough to keep the object alive (it works fine BTW.)
It is also possible to take ref's representing the pointers in the list but they
wouldn't add anything.

On the other point. Is it standard practice to user ERRPTRs in the kernel rather than
return arguments? I seem to have seen both styles used but it may have been my imagination. I don't have any preference here but sometimes choose one or the other
in parallel flows to make comparable routines in the flows have similar interfaces.

Bob

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-next v10 07/17] RDMA/rxe: Use kzmalloc/kfree for mca
  2022-02-01 20:00     ` Bob Pearson
@ 2022-02-01 20:14       ` Jason Gunthorpe
  2022-02-01 20:30         ` Bob Pearson
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2022-02-01 20:14 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Tue, Feb 01, 2022 at 02:00:09PM -0600, Bob Pearson wrote:


> as currently written the local variable has a kref obtained from the kref_get in
> rxe_lookup_mcg or the kref_init in rxe_init_mcg if it is newly created. This ref is
> dropped when the local variable goes out of scope. To protect the mcg when it is
> inactive at least one more ref is required. I take an additional ref in rxe_get_mcg
> if the mcg is created to protect the pointer in the red-black tree. This persists
> for the lifetime of the object until it is destroyed when it is removed from the tree
> and has the longest scope. This is enough to keep the object alive (it works fine BTW.)
> It is also possible to take ref's representing the pointers in the list but they
> wouldn't add anything.

I think I got it upside down, but OK this works for me. What was
kbuild complaining about?

> On the other point. Is it standard practice to user ERRPTRs in the
> kernel rather than return arguments? I seem to have seen both styles
> used but it may have been my imagination. I don't have any
> preference here but sometimes choose one or the other in parallel
> flows to make comparable routines in the flows have similar
> interfaces.

Always prefer errptrs.

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH for-next v10 07/17] RDMA/rxe: Use kzmalloc/kfree for mca
  2022-02-01 20:14       ` Jason Gunthorpe
@ 2022-02-01 20:30         ` Bob Pearson
  0 siblings, 0 replies; 25+ messages in thread
From: Bob Pearson @ 2022-02-01 20:30 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 2/1/22 14:14, Jason Gunthorpe wrote:
> On Tue, Feb 01, 2022 at 02:00:09PM -0600, Bob Pearson wrote:
> 
> 
>> as currently written the local variable has a kref obtained from the kref_get in
>> rxe_lookup_mcg or the kref_init in rxe_init_mcg if it is newly created. This ref is
>> dropped when the local variable goes out of scope. To protect the mcg when it is
>> inactive at least one more ref is required. I take an additional ref in rxe_get_mcg
>> if the mcg is created to protect the pointer in the red-black tree. This persists
>> for the lifetime of the object until it is destroyed when it is removed from the tree
>> and has the longest scope. This is enough to keep the object alive (it works fine BTW.)
>> It is also possible to take ref's representing the pointers in the list but they
>> wouldn't add anything.
> 
> I think I got it upside down, but OK this works for me. What was
> kbuild complaining about?
> 
>> On the other point. Is it standard practice to user ERRPTRs in the
>> kernel rather than return arguments? I seem to have seen both styles
>> used but it may have been my imagination. I don't have any
>> preference here but sometimes choose one or the other in parallel
>> flows to make comparable routines in the flows have similar
>> interfaces.
> 
> Always prefer errptrs.
> 
> Jason

this API is a little different than most of the verbs APIs where typically we have

obj = xx_create_obj()
	takes a reference at kref_init() and doesn't drop it on the non error path.
	returns obj.

xx_other_code_using_obj(obj)
	uses obj holding ref

xx_destroy_obj(obj)
	drops the ref to obj

here the object is a local variable in xx_mcast_attach() and nothing is returned.

For InfiniBand the create and destroy mcast group function is triggered by MADs
from some central mcast server. For RoCEv2 that doesn't happen and these APIs
don't exist.

	
a missing static declaration at line 262. If you want me to I can fix it and resend.
Do I really have to mint new version?

Bob


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-02-01 20:30 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-31 22:08 [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 01/17] RDMA/rxe: Move rxe_mcast_add/delete " Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 02/17] RDMA/rxe: Move rxe_mcast_attach/detach " Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 03/17] RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 04/17] RDMA/rxe: Enforce IBA o10-2.2.3 Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 05/17] RDMA/rxe: Remove rxe_drop_all_macst_groups Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 06/17] RDMA/rxe: Remove qp->grp_lock and qp->grp_list Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 07/17] RDMA/rxe: Use kzmalloc/kfree for mca Bob Pearson
2022-02-01  1:03   ` kernel test robot
2022-02-01 14:53   ` Jason Gunthorpe
2022-02-01 20:00     ` Bob Pearson
2022-02-01 20:14       ` Jason Gunthorpe
2022-02-01 20:30         ` Bob Pearson
2022-02-01 18:20   ` kernel test robot
2022-01-31 22:08 ` [PATCH for-next v10 08/17] RDMA/rxe: Rename grp to mcg and mce to mca Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 09/17] RDMA/rxe: Introduce RXECB(skb) Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 10/17] RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 11/17] RDMA/rxe: Replace mcg locks by rxe->mcg_lock Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 12/17] RDMA/rxe: Replace pool key by rxe->mcg_tree Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 13/17] RDMA/rxe: Remove key'ed object support Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 14/17] RDMA/rxe: Remove mcg from rxe pools Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 15/17] RDMA/rxe: Add code to cleanup mcast memory Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 16/17] RDMA/rxe: Add comments to rxe_mcast.c Bob Pearson
2022-01-31 22:08 ` [PATCH for-next v10 17/17] RDMA/rxe: Finish cleanup of rxe_mcast.c Bob Pearson
2022-02-01 14:36 ` [PATCH for-next v10 00/17] Move two object pools to rxe_mcast.c Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).