All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v9 00/26]
@ 2022-01-27 21:37 Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 01/26] RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c Bob Pearson
                   ` (26 more replies)
  0 siblings, 27 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

There are several race conditions discovered in the current rdma_rxe
driver.  They mostly relate to races between normal operations and
destroying objects.  This patch series
 - Makes several minor cleanups in rxe_pool.[ch]
 - Replaces the red-black trees currently used by xarrays for indices
 - Simplifies the API for keyed objects
 - Corrects several reference counting errors
 - Adds wait for completions to the paths in verbs APIs which destroy
   objects.

The patch series has been changed to RFC PATCH instead of PATCH for-next
because I have little experience with rcu locking and would like
someone else to review this code (in 18/26 and 24/26). RCU locking
should improve performance at large scale but this has not been tested
yet.

This patch series applies cleanly to current for-next.
commit e783362eb54cd99b2cac8b3a9aeac942e6f6ac07 (tag: v5.17-rc1,
		origin/wip/jgg-for-rc, origin/wip/jgg-for-next,
		origin/wip/for-testing, origin/for-rc,
		origin/for-next, origin/HEAD, for-next)

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
v9
  Corrected issues reported by Jason Gunthorpe,
  Converted locking in rxe_mcast.c and rxe_pool.c to use RCU
  Split up the patches into smaller changes
v8
  Fixed an additional race in 3/8 which was not handled correctly.
v7
  Corrected issues reported by Jason Gunthorpe
Link: https://lore.kernel.org/linux-rdma/20211207190947.GH6385@nvidia.com/
Link: https://lore.kernel.org/linux-rdma/20211207191857.GI6385@nvidia.com/
Link: https://lore.kernel.org/linux-rdma/20211207192824.GJ6385@nvidia.com/
v6
  Fixed a kzalloc flags bug.
  Fixed comment bug reported by 'Kernel Test Robot'.
  Changed type of rxe_pool.c in __rxe_fini().
v5
  Removed patches already accepted into for-next and addressed comments
  from Jason Gunthorpe.
v4
  Restructured patch series to change to xarray earlier which
  greatly simplified the changes.
  Rebased to current for-next
v3
  Changed rxe_alloc to use GFP_KERNEL
  Addressed other comments by Jason Gunthorp
  Merged the previous 06/10 and 07/10 patches into one since they overlapped
  Added some minor cleanups as 10/10
v2
  Rebased to current for-next.
  Added 4 additional patches

Bob Pearson (26):
  RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c
  RDMA/rxe: Move rxe_mcast_attach/detach to rxe_mcast.c
  RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem
  RDMA/rxe: Enforce IBA o10-2.2.3
  RDMA/rxe: Remove rxe_drop_all_macst_groups
  RDMA/rxe: Remove qp->grp_lock and qp->grp_list
  RDMA/rxe: Use kzmalloc/kfree for mca
  RDMA/rxe: Rename grp to mcg and mce to mca
  RDMA/rxe: Introduce RXECB(skb)
  RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases
  RDMA/rxe: Replace locks by rxe->mcg_lock
  RDMA/rxe: Replace pool key by rxe->mcg_tree
  RDMA/rxe: Remove key'ed object support
  RDMA/rxe: Remove mcg from rxe pools
  RDMA/rxe: Add code to cleanup mcast memory
  RDMA/rxe: Add comments to rxe_mcast.c
  RDMA/rxe: Separate code into subroutines
  RDMA/rxe: Convert mca read locking to RCU
  RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
  RDMA/rxe: Delete _locked() APIs for pool objects
  RDMA/rxe: Replace obj by elem in declaration
  RDMA/rxe: Replace red-black trees by xarrays
  RDMA/rxe: Change pool locking to RCU
  RDMA/rxe: Add wait_for_completion to pool objects
  RDMA/rxe: Fix ref error in rxe_av.c
  RDMA/rxe: Replace mr by rkey in responder resources

 drivers/infiniband/sw/rxe/rxe.c       | 107 +---
 drivers/infiniband/sw/rxe/rxe_av.c    |  19 +-
 drivers/infiniband/sw/rxe/rxe_hdr.h   |   3 +
 drivers/infiniband/sw/rxe/rxe_loc.h   |  33 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c | 678 ++++++++++++++++------
 drivers/infiniband/sw/rxe/rxe_mr.c    |   2 +-
 drivers/infiniband/sw/rxe/rxe_mw.c    |  11 +-
 drivers/infiniband/sw/rxe/rxe_net.c   |  35 +-
 drivers/infiniband/sw/rxe/rxe_pool.c  | 798 ++++++++++----------------
 drivers/infiniband/sw/rxe/rxe_pool.h  | 233 +++-----
 drivers/infiniband/sw/rxe/rxe_qp.c    |  29 +-
 drivers/infiniband/sw/rxe/rxe_recv.c  |  98 ++--
 drivers/infiniband/sw/rxe/rxe_req.c   |  55 +-
 drivers/infiniband/sw/rxe/rxe_resp.c  | 125 ++--
 drivers/infiniband/sw/rxe/rxe_verbs.c |  54 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h |  26 +-
 16 files changed, 1159 insertions(+), 1147 deletions(-)
 rewrite drivers/infiniband/sw/rxe/rxe_mcast.c (86%)
 rewrite drivers/infiniband/sw/rxe/rxe_pool.c (67%)
 rewrite drivers/infiniband/sw/rxe/rxe_pool.h (73%)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 01/26] RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 02/26] RDMA/rxe: Move rxe_mcast_attach/detach " Bob Pearson
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Move rxe_mcast_add and rxe_mcast_delete from rxe_net.c to rxe_mcast.c,
make static and remove declarations from rxe_loc.h.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_loc.h   |  2 --
 drivers/infiniband/sw/rxe/rxe_mcast.c | 18 ++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_net.c   | 18 ------------------
 3 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index b1e174afb1d4..bcec33c3c3b7 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -106,8 +106,6 @@ int rxe_prepare(struct rxe_pkt_info *pkt, struct sk_buff *skb);
 int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 		    struct sk_buff *skb);
 const char *rxe_parent_name(struct rxe_dev *rxe, unsigned int port_num);
-int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid);
-int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid);
 
 /* rxe_qp.c */
 int rxe_qp_chk_init(struct rxe_dev *rxe, struct ib_qp_init_attr *init);
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index bd1ac88b8700..e5689c161984 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -7,6 +7,24 @@
 #include "rxe.h"
 #include "rxe_loc.h"
 
+static int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
+{
+	unsigned char ll_addr[ETH_ALEN];
+
+	ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
+
+	return dev_mc_add(rxe->ndev, ll_addr);
+}
+
+static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
+{
+	unsigned char ll_addr[ETH_ALEN];
+
+	ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
+
+	return dev_mc_del(rxe->ndev, ll_addr);
+}
+
 /* caller should hold mc_grp_pool->pool_lock */
 static struct rxe_mc_grp *create_grp(struct rxe_dev *rxe,
 				     struct rxe_pool *pool,
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index be72bdbfb4ba..a8cfa7160478 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -20,24 +20,6 @@
 
 static struct rxe_recv_sockets recv_sockets;
 
-int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
-{
-	unsigned char ll_addr[ETH_ALEN];
-
-	ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
-
-	return dev_mc_add(rxe->ndev, ll_addr);
-}
-
-int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
-{
-	unsigned char ll_addr[ETH_ALEN];
-
-	ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
-
-	return dev_mc_del(rxe->ndev, ll_addr);
-}
-
 static struct dst_entry *rxe_find_route4(struct net_device *ndev,
 				  struct in_addr *saddr,
 				  struct in_addr *daddr)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 02/26] RDMA/rxe: Move rxe_mcast_attach/detach to rxe_mcast.c
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 01/26] RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 03/26] RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem Bob Pearson
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Move rxe_mcast_attach and rxe_mcast_detach from rxe_verbs.c to rxe_mcast.c,
Make non-static and add declarations to rxe_loc.h. Make the subroutines
in rxe_mcast.c referenced by these routines static and remove their
declarations from rxe_loc.h.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_loc.h   | 12 ++-------
 drivers/infiniband/sw/rxe/rxe_mcast.c | 36 +++++++++++++++++++++++----
 drivers/infiniband/sw/rxe/rxe_verbs.c | 26 -------------------
 3 files changed, 33 insertions(+), 41 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index bcec33c3c3b7..dc606241f0d6 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -40,18 +40,10 @@ void rxe_cq_disable(struct rxe_cq *cq);
 void rxe_cq_cleanup(struct rxe_pool_elem *arg);
 
 /* rxe_mcast.c */
-int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
-		      struct rxe_mc_grp **grp_p);
-
-int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-			   struct rxe_mc_grp *grp);
-
-int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-			    union ib_gid *mgid);
-
 void rxe_drop_all_mcast_groups(struct rxe_qp *qp);
-
 void rxe_mc_cleanup(struct rxe_pool_elem *arg);
+int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
+int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 
 /* rxe_mmap.c */
 struct rxe_mmap_info {
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index e5689c161984..f86e32f4e77f 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -52,8 +52,8 @@ static struct rxe_mc_grp *create_grp(struct rxe_dev *rxe,
 	return grp;
 }
 
-int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
-		      struct rxe_mc_grp **grp_p)
+static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
+			     struct rxe_mc_grp **grp_p)
 {
 	int err;
 	struct rxe_mc_grp *grp;
@@ -81,7 +81,7 @@ int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 	return 0;
 }
 
-int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
+static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			   struct rxe_mc_grp *grp)
 {
 	int err;
@@ -125,8 +125,8 @@ int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	return err;
 }
 
-int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-			    union ib_gid *mgid)
+static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
+				   union ib_gid *mgid)
 {
 	struct rxe_mc_grp *grp;
 	struct rxe_mc_elem *elem, *tmp;
@@ -194,3 +194,29 @@ void rxe_mc_cleanup(struct rxe_pool_elem *elem)
 	rxe_drop_key(grp);
 	rxe_mcast_delete(rxe, &grp->mgid);
 }
+
+int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
+{
+	int err;
+	struct rxe_dev *rxe = to_rdev(ibqp->device);
+	struct rxe_qp *qp = to_rqp(ibqp);
+	struct rxe_mc_grp *grp;
+
+	/* takes a ref on grp if successful */
+	err = rxe_mcast_get_grp(rxe, mgid, &grp);
+	if (err)
+		return err;
+
+	err = rxe_mcast_add_grp_elem(rxe, qp, grp);
+
+	rxe_drop_ref(grp);
+	return err;
+}
+
+int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
+{
+	struct rxe_dev *rxe = to_rdev(ibqp->device);
+	struct rxe_qp *qp = to_rqp(ibqp);
+
+	return rxe_mcast_drop_grp_elem(rxe, qp, mgid);
+}
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 915ad6664321..f7682541f9af 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -999,32 +999,6 @@ static int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
 	return n;
 }
 
-static int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
-{
-	int err;
-	struct rxe_dev *rxe = to_rdev(ibqp->device);
-	struct rxe_qp *qp = to_rqp(ibqp);
-	struct rxe_mc_grp *grp;
-
-	/* takes a ref on grp if successful */
-	err = rxe_mcast_get_grp(rxe, mgid, &grp);
-	if (err)
-		return err;
-
-	err = rxe_mcast_add_grp_elem(rxe, qp, grp);
-
-	rxe_drop_ref(grp);
-	return err;
-}
-
-static int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
-{
-	struct rxe_dev *rxe = to_rdev(ibqp->device);
-	struct rxe_qp *qp = to_rqp(ibqp);
-
-	return rxe_mcast_drop_grp_elem(rxe, qp, mgid);
-}
-
 static ssize_t parent_show(struct device *device,
 			   struct device_attribute *attr, char *buf)
 {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 03/26] RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 01/26] RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 02/26] RDMA/rxe: Move rxe_mcast_attach/detach " Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 04/26] RDMA/rxe: Enforce IBA o10-2.2.3 Bob Pearson
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Rename rxe_mc_grp to rxe_mcg. Rename rxe_mc_elem to rxe_mca.
These can be read 'multicast group' and 'multicast attachment'.
'elem' collided with the use of elem in rxe pools and was a little
confusing.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 26 +++++++++++++-------------
 drivers/infiniband/sw/rxe/rxe_pool.c  | 10 +++++-----
 drivers/infiniband/sw/rxe/rxe_recv.c  |  4 ++--
 drivers/infiniband/sw/rxe/rxe_verbs.h |  6 +++---
 4 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index f86e32f4e77f..949784198d80 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -26,12 +26,12 @@ static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
 }
 
 /* caller should hold mc_grp_pool->pool_lock */
-static struct rxe_mc_grp *create_grp(struct rxe_dev *rxe,
+static struct rxe_mcg *create_grp(struct rxe_dev *rxe,
 				     struct rxe_pool *pool,
 				     union ib_gid *mgid)
 {
 	int err;
-	struct rxe_mc_grp *grp;
+	struct rxe_mcg *grp;
 
 	grp = rxe_alloc_locked(&rxe->mc_grp_pool);
 	if (!grp)
@@ -53,10 +53,10 @@ static struct rxe_mc_grp *create_grp(struct rxe_dev *rxe,
 }
 
 static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
-			     struct rxe_mc_grp **grp_p)
+			     struct rxe_mcg **grp_p)
 {
 	int err;
-	struct rxe_mc_grp *grp;
+	struct rxe_mcg *grp;
 	struct rxe_pool *pool = &rxe->mc_grp_pool;
 
 	if (rxe->attr.max_mcast_qp_attach == 0)
@@ -82,10 +82,10 @@ static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 }
 
 static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-			   struct rxe_mc_grp *grp)
+			   struct rxe_mcg *grp)
 {
 	int err;
-	struct rxe_mc_elem *elem;
+	struct rxe_mca *elem;
 
 	/* check to see of the qp is already a member of the group */
 	spin_lock_bh(&qp->grp_lock);
@@ -128,8 +128,8 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 				   union ib_gid *mgid)
 {
-	struct rxe_mc_grp *grp;
-	struct rxe_mc_elem *elem, *tmp;
+	struct rxe_mcg *grp;
+	struct rxe_mca *elem, *tmp;
 
 	grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
 	if (!grp)
@@ -162,8 +162,8 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 void rxe_drop_all_mcast_groups(struct rxe_qp *qp)
 {
-	struct rxe_mc_grp *grp;
-	struct rxe_mc_elem *elem;
+	struct rxe_mcg *grp;
+	struct rxe_mca *elem;
 
 	while (1) {
 		spin_lock_bh(&qp->grp_lock);
@@ -171,7 +171,7 @@ void rxe_drop_all_mcast_groups(struct rxe_qp *qp)
 			spin_unlock_bh(&qp->grp_lock);
 			break;
 		}
-		elem = list_first_entry(&qp->grp_list, struct rxe_mc_elem,
+		elem = list_first_entry(&qp->grp_list, struct rxe_mca,
 					grp_list);
 		list_del(&elem->grp_list);
 		spin_unlock_bh(&qp->grp_lock);
@@ -188,7 +188,7 @@ void rxe_drop_all_mcast_groups(struct rxe_qp *qp)
 
 void rxe_mc_cleanup(struct rxe_pool_elem *elem)
 {
-	struct rxe_mc_grp *grp = container_of(elem, typeof(*grp), elem);
+	struct rxe_mcg *grp = container_of(elem, typeof(*grp), elem);
 	struct rxe_dev *rxe = grp->rxe;
 
 	rxe_drop_key(grp);
@@ -200,7 +200,7 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	int err;
 	struct rxe_dev *rxe = to_rdev(ibqp->device);
 	struct rxe_qp *qp = to_rqp(ibqp);
-	struct rxe_mc_grp *grp;
+	struct rxe_mcg *grp;
 
 	/* takes a ref on grp if successful */
 	err = rxe_mcast_get_grp(rxe, mgid, &grp);
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 4cb003885e00..63c594173565 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -83,17 +83,17 @@ static const struct rxe_type_info {
 	},
 	[RXE_TYPE_MC_GRP] = {
 		.name		= "rxe-mc_grp",
-		.size		= sizeof(struct rxe_mc_grp),
-		.elem_offset	= offsetof(struct rxe_mc_grp, elem),
+		.size		= sizeof(struct rxe_mcg),
+		.elem_offset	= offsetof(struct rxe_mcg, elem),
 		.cleanup	= rxe_mc_cleanup,
 		.flags		= RXE_POOL_KEY,
-		.key_offset	= offsetof(struct rxe_mc_grp, mgid),
+		.key_offset	= offsetof(struct rxe_mcg, mgid),
 		.key_size	= sizeof(union ib_gid),
 	},
 	[RXE_TYPE_MC_ELEM] = {
 		.name		= "rxe-mc_elem",
-		.size		= sizeof(struct rxe_mc_elem),
-		.elem_offset	= offsetof(struct rxe_mc_elem, elem),
+		.size		= sizeof(struct rxe_mca),
+		.elem_offset	= offsetof(struct rxe_mca, elem),
 	},
 };
 
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 6a6cc1fa90e4..7ff6b53555f4 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -233,8 +233,8 @@ static inline void rxe_rcv_pkt(struct rxe_pkt_info *pkt, struct sk_buff *skb)
 static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 {
 	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
-	struct rxe_mc_grp *mcg;
-	struct rxe_mc_elem *mce;
+	struct rxe_mcg *mcg;
+	struct rxe_mca *mce;
 	struct rxe_qp *qp;
 	union ib_gid dgid;
 	int err;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index e48969e8d4c8..388b7dc23dd7 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -353,7 +353,7 @@ struct rxe_mw {
 	u64			length;
 };
 
-struct rxe_mc_grp {
+struct rxe_mcg {
 	struct rxe_pool_elem	elem;
 	spinlock_t		mcg_lock; /* guard group */
 	struct rxe_dev		*rxe;
@@ -364,12 +364,12 @@ struct rxe_mc_grp {
 	u16			pkey;
 };
 
-struct rxe_mc_elem {
+struct rxe_mca {
 	struct rxe_pool_elem	elem;
 	struct list_head	qp_list;
 	struct list_head	grp_list;
 	struct rxe_qp		*qp;
-	struct rxe_mc_grp	*grp;
+	struct rxe_mcg		*grp;
 };
 
 struct rxe_port {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 04/26] RDMA/rxe: Enforce IBA o10-2.2.3
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (2 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 03/26] RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-28 12:53   ` Jason Gunthorpe
  2022-01-27 21:37 ` [RFC PATCH v9 05/26] RDMA/rxe: Remove rxe_drop_all_macst_groups Bob Pearson
                   ` (22 subsequent siblings)
  26 siblings, 1 reply; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Add code to check if a QP is attached to one or more multicast groups
when destroy_qp is called and return an error if so.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_loc.h   |  9 +--------
 drivers/infiniband/sw/rxe/rxe_mcast.c |  2 ++
 drivers/infiniband/sw/rxe/rxe_qp.c    | 14 ++++++++++++++
 drivers/infiniband/sw/rxe/rxe_verbs.c |  5 +++++
 drivers/infiniband/sw/rxe/rxe_verbs.h |  1 +
 5 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index dc606241f0d6..052beaaacf43 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -101,26 +101,19 @@ const char *rxe_parent_name(struct rxe_dev *rxe, unsigned int port_num);
 
 /* rxe_qp.c */
 int rxe_qp_chk_init(struct rxe_dev *rxe, struct ib_qp_init_attr *init);
-
 int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd,
 		     struct ib_qp_init_attr *init,
 		     struct rxe_create_qp_resp __user *uresp,
 		     struct ib_pd *ibpd, struct ib_udata *udata);
-
 int rxe_qp_to_init(struct rxe_qp *qp, struct ib_qp_init_attr *init);
-
 int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp *qp,
 		    struct ib_qp_attr *attr, int mask);
-
 int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr,
 		     int mask, struct ib_udata *udata);
-
 int rxe_qp_to_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask);
-
 void rxe_qp_error(struct rxe_qp *qp);
-
+int rxe_qp_chk_destroy(struct rxe_qp *qp);
 void rxe_qp_destroy(struct rxe_qp *qp);
-
 void rxe_qp_cleanup(struct rxe_pool_elem *elem);
 
 static inline int qp_num(struct rxe_qp *qp)
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 949784198d80..34e3c52f0b72 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -114,6 +114,7 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	grp->num_qp++;
 	elem->qp = qp;
 	elem->grp = grp;
+	atomic_inc(&qp->mcg_num);
 
 	list_add(&elem->qp_list, &grp->qp_list);
 	list_add(&elem->grp_list, &qp->grp_list);
@@ -143,6 +144,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			list_del(&elem->qp_list);
 			list_del(&elem->grp_list);
 			grp->num_qp--;
+			atomic_dec(&qp->mcg_num);
 
 			spin_unlock_bh(&grp->mcg_lock);
 			spin_unlock_bh(&qp->grp_lock);
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 5018b9387694..2af19b79dd23 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -770,6 +770,20 @@ int rxe_qp_to_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask)
 	return 0;
 }
 
+int rxe_qp_chk_destroy(struct rxe_qp *qp)
+{
+	/* See IBA o10-2.2.3
+	 * An attempt to destroy a QP while attached to a mcast group
+	 * will fail immediately.
+	 */
+	if (atomic_read(&qp->mcg_num)) {
+		pr_warn_once("Attempt to destroy QP while attached to multicast group\n");
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
 /* called by the destroy qp verb */
 void rxe_qp_destroy(struct rxe_qp *qp)
 {
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index f7682541f9af..9f0aef4b649d 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -493,6 +493,11 @@ static int rxe_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 {
 	struct rxe_qp *qp = to_rqp(ibqp);
+	int ret;
+
+	ret = rxe_qp_chk_destroy(qp);
+	if (ret)
+		return ret;
 
 	rxe_qp_destroy(qp);
 	rxe_drop_index(qp);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 388b7dc23dd7..4910d0782e33 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -235,6 +235,7 @@ struct rxe_qp {
 	/* list of mcast groups qp has joined (for cleanup) */
 	struct list_head	grp_list;
 	spinlock_t		grp_lock; /* guard grp_list */
+	atomic_t		mcg_num;
 
 	struct sk_buff_head	req_pkts;
 	struct sk_buff_head	resp_pkts;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 05/26] RDMA/rxe: Remove rxe_drop_all_macst_groups
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (3 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 04/26] RDMA/rxe: Enforce IBA o10-2.2.3 Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 06/26] RDMA/rxe: Remove qp->grp_lock and qp->grp_list Bob Pearson
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

With o10-2.2.3 enforced rxe_drop_all_mcast_groups is completely
unnecessary. Remove it and references to it.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_loc.h   |  1 -
 drivers/infiniband/sw/rxe/rxe_mcast.c | 26 --------------------------
 drivers/infiniband/sw/rxe/rxe_qp.c    |  2 --
 3 files changed, 29 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 052beaaacf43..af40e3c212fb 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -40,7 +40,6 @@ void rxe_cq_disable(struct rxe_cq *cq);
 void rxe_cq_cleanup(struct rxe_pool_elem *arg);
 
 /* rxe_mcast.c */
-void rxe_drop_all_mcast_groups(struct rxe_qp *qp);
 void rxe_mc_cleanup(struct rxe_pool_elem *arg);
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 34e3c52f0b72..39a41daa7a6b 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -162,32 +162,6 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	return -EINVAL;
 }
 
-void rxe_drop_all_mcast_groups(struct rxe_qp *qp)
-{
-	struct rxe_mcg *grp;
-	struct rxe_mca *elem;
-
-	while (1) {
-		spin_lock_bh(&qp->grp_lock);
-		if (list_empty(&qp->grp_list)) {
-			spin_unlock_bh(&qp->grp_lock);
-			break;
-		}
-		elem = list_first_entry(&qp->grp_list, struct rxe_mca,
-					grp_list);
-		list_del(&elem->grp_list);
-		spin_unlock_bh(&qp->grp_lock);
-
-		grp = elem->grp;
-		spin_lock_bh(&grp->mcg_lock);
-		list_del(&elem->qp_list);
-		grp->num_qp--;
-		spin_unlock_bh(&grp->mcg_lock);
-		rxe_drop_ref(grp);
-		rxe_drop_ref(elem);
-	}
-}
-
 void rxe_mc_cleanup(struct rxe_pool_elem *elem)
 {
 	struct rxe_mcg *grp = container_of(elem, typeof(*grp), elem);
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 2af19b79dd23..087126550caf 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -812,8 +812,6 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
 {
 	struct rxe_qp *qp = container_of(work, typeof(*qp), cleanup_work.work);
 
-	rxe_drop_all_mcast_groups(qp);
-
 	if (qp->sq.queue)
 		rxe_queue_cleanup(qp->sq.queue);
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 06/26] RDMA/rxe: Remove qp->grp_lock and qp->grp_list
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (4 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 05/26] RDMA/rxe: Remove rxe_drop_all_macst_groups Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 07/26] RDMA/rxe: Use kzmalloc/kfree for mca Bob Pearson
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Since it is no longer required to cleanup attachments to multicast
groups when a QP is destroyed qp->grp_lock and qp->grp_list are
no longer needed and are removed.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 8 --------
 drivers/infiniband/sw/rxe/rxe_qp.c    | 3 ---
 drivers/infiniband/sw/rxe/rxe_verbs.h | 5 -----
 3 files changed, 16 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 39a41daa7a6b..9336295c4ee2 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -88,7 +88,6 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	struct rxe_mca *elem;
 
 	/* check to see of the qp is already a member of the group */
-	spin_lock_bh(&qp->grp_lock);
 	spin_lock_bh(&grp->mcg_lock);
 	list_for_each_entry(elem, &grp->qp_list, qp_list) {
 		if (elem->qp == qp) {
@@ -113,16 +112,13 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	grp->num_qp++;
 	elem->qp = qp;
-	elem->grp = grp;
 	atomic_inc(&qp->mcg_num);
 
 	list_add(&elem->qp_list, &grp->qp_list);
-	list_add(&elem->grp_list, &qp->grp_list);
 
 	err = 0;
 out:
 	spin_unlock_bh(&grp->mcg_lock);
-	spin_unlock_bh(&qp->grp_lock);
 	return err;
 }
 
@@ -136,18 +132,15 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	if (!grp)
 		goto err1;
 
-	spin_lock_bh(&qp->grp_lock);
 	spin_lock_bh(&grp->mcg_lock);
 
 	list_for_each_entry_safe(elem, tmp, &grp->qp_list, qp_list) {
 		if (elem->qp == qp) {
 			list_del(&elem->qp_list);
-			list_del(&elem->grp_list);
 			grp->num_qp--;
 			atomic_dec(&qp->mcg_num);
 
 			spin_unlock_bh(&grp->mcg_lock);
-			spin_unlock_bh(&qp->grp_lock);
 			rxe_drop_ref(elem);
 			rxe_drop_ref(grp);	/* ref held by QP */
 			rxe_drop_ref(grp);	/* ref from get_key */
@@ -156,7 +149,6 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	}
 
 	spin_unlock_bh(&grp->mcg_lock);
-	spin_unlock_bh(&qp->grp_lock);
 	rxe_drop_ref(grp);			/* ref from get_key */
 err1:
 	return -EINVAL;
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 087126550caf..742073ce0709 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -188,9 +188,6 @@ static void rxe_qp_init_misc(struct rxe_dev *rxe, struct rxe_qp *qp,
 		break;
 	}
 
-	INIT_LIST_HEAD(&qp->grp_list);
-
-	spin_lock_init(&qp->grp_lock);
 	spin_lock_init(&qp->state_lock);
 
 	atomic_set(&qp->ssn, 0);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 4910d0782e33..55f8ed2bc621 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -232,9 +232,6 @@ struct rxe_qp {
 	struct rxe_av		pri_av;
 	struct rxe_av		alt_av;
 
-	/* list of mcast groups qp has joined (for cleanup) */
-	struct list_head	grp_list;
-	spinlock_t		grp_lock; /* guard grp_list */
 	atomic_t		mcg_num;
 
 	struct sk_buff_head	req_pkts;
@@ -368,9 +365,7 @@ struct rxe_mcg {
 struct rxe_mca {
 	struct rxe_pool_elem	elem;
 	struct list_head	qp_list;
-	struct list_head	grp_list;
 	struct rxe_qp		*qp;
-	struct rxe_mcg		*grp;
 };
 
 struct rxe_port {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 07/26] RDMA/rxe: Use kzmalloc/kfree for mca
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (5 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 06/26] RDMA/rxe: Remove qp->grp_lock and qp->grp_list Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-28 18:00   ` Jason Gunthorpe
  2022-01-27 21:37 ` [RFC PATCH v9 08/26] RDMA/rxe: Rename grp to mcg and mce to mca Bob Pearson
                   ` (19 subsequent siblings)
  26 siblings, 1 reply; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Remove rxe_mca (was rxe_mc_elem) from rxe pools and use kzmalloc
and kfree to allocate and free. Use the sequence

    <lookup qp>
    new_mca = kzalloc(sizeof(*new_mca), GFP_KERNEL);
    <spin lock>
    <lookup qp again> /* in case of a race */
    <init new_mca>
    <spin unlock>

instead of GFP_ATOMIC inside of the spinlock. Add an extra reference
to mcg when a new one is created and drop when the last qp is detached.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |  8 -----
 drivers/infiniband/sw/rxe/rxe_mcast.c | 51 ++++++++++++++++-----------
 drivers/infiniband/sw/rxe/rxe_pool.c  |  5 ---
 drivers/infiniband/sw/rxe/rxe_pool.h  |  1 -
 drivers/infiniband/sw/rxe/rxe_verbs.h |  2 --
 5 files changed, 30 insertions(+), 37 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index fab291245366..c55736e441e7 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -29,7 +29,6 @@ void rxe_dealloc(struct ib_device *ib_dev)
 	rxe_pool_cleanup(&rxe->mr_pool);
 	rxe_pool_cleanup(&rxe->mw_pool);
 	rxe_pool_cleanup(&rxe->mc_grp_pool);
-	rxe_pool_cleanup(&rxe->mc_elem_pool);
 
 	if (rxe->tfm)
 		crypto_free_shash(rxe->tfm);
@@ -163,15 +162,8 @@ static int rxe_init_pools(struct rxe_dev *rxe)
 	if (err)
 		goto err9;
 
-	err = rxe_pool_init(rxe, &rxe->mc_elem_pool, RXE_TYPE_MC_ELEM,
-			    rxe->attr.max_total_mcast_qp_attach);
-	if (err)
-		goto err10;
-
 	return 0;
 
-err10:
-	rxe_pool_cleanup(&rxe->mc_grp_pool);
 err9:
 	rxe_pool_cleanup(&rxe->mw_pool);
 err8:
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 9336295c4ee2..39f38ee665f2 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -36,6 +36,7 @@ static struct rxe_mcg *create_grp(struct rxe_dev *rxe,
 	grp = rxe_alloc_locked(&rxe->mc_grp_pool);
 	if (!grp)
 		return ERR_PTR(-ENOMEM);
+	rxe_add_ref(grp);
 
 	INIT_LIST_HEAD(&grp->qp_list);
 	spin_lock_init(&grp->mcg_lock);
@@ -85,12 +86,28 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			   struct rxe_mcg *grp)
 {
 	int err;
-	struct rxe_mca *elem;
+	struct rxe_mca *mca, *new_mca;
 
-	/* check to see of the qp is already a member of the group */
+	/* check to see if the qp is already a member of the group */
 	spin_lock_bh(&grp->mcg_lock);
-	list_for_each_entry(elem, &grp->qp_list, qp_list) {
-		if (elem->qp == qp) {
+	list_for_each_entry(mca, &grp->qp_list, qp_list) {
+		if (mca->qp == qp) {
+			spin_unlock_bh(&grp->mcg_lock);
+			return 0;
+		}
+	}
+	spin_unlock_bh(&grp->mcg_lock);
+
+	/* speculative alloc new mca without using GFP_ATOMIC */
+	new_mca = kzalloc(sizeof(*mca), GFP_KERNEL);
+	if (!new_mca)
+		return -ENOMEM;
+
+	spin_lock_bh(&grp->mcg_lock);
+	/* re-check to see if someone else just attached qp */
+	list_for_each_entry(mca, &grp->qp_list, qp_list) {
+		if (mca->qp == qp) {
+			kfree(new_mca);
 			err = 0;
 			goto out;
 		}
@@ -101,20 +118,11 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 		goto out;
 	}
 
-	elem = rxe_alloc_locked(&rxe->mc_elem_pool);
-	if (!elem) {
-		err = -ENOMEM;
-		goto out;
-	}
-
-	/* each qp holds a ref on the grp */
-	rxe_add_ref(grp);
-
 	grp->num_qp++;
-	elem->qp = qp;
+	new_mca->qp = qp;
 	atomic_inc(&qp->mcg_num);
 
-	list_add(&elem->qp_list, &grp->qp_list);
+	list_add(&new_mca->qp_list, &grp->qp_list);
 
 	err = 0;
 out:
@@ -126,7 +134,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 				   union ib_gid *mgid)
 {
 	struct rxe_mcg *grp;
-	struct rxe_mca *elem, *tmp;
+	struct rxe_mca *mca, *tmp;
 
 	grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
 	if (!grp)
@@ -134,16 +142,17 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	spin_lock_bh(&grp->mcg_lock);
 
-	list_for_each_entry_safe(elem, tmp, &grp->qp_list, qp_list) {
-		if (elem->qp == qp) {
-			list_del(&elem->qp_list);
+	list_for_each_entry_safe(mca, tmp, &grp->qp_list, qp_list) {
+		if (mca->qp == qp) {
+			list_del(&mca->qp_list);
 			grp->num_qp--;
+			if (grp->num_qp <= 0)
+				rxe_drop_ref(grp);
 			atomic_dec(&qp->mcg_num);
 
 			spin_unlock_bh(&grp->mcg_lock);
-			rxe_drop_ref(elem);
-			rxe_drop_ref(grp);	/* ref held by QP */
 			rxe_drop_ref(grp);	/* ref from get_key */
+			kfree(mca);
 			return 0;
 		}
 	}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 63c594173565..a6756aa93e2b 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -90,11 +90,6 @@ static const struct rxe_type_info {
 		.key_offset	= offsetof(struct rxe_mcg, mgid),
 		.key_size	= sizeof(union ib_gid),
 	},
-	[RXE_TYPE_MC_ELEM] = {
-		.name		= "rxe-mc_elem",
-		.size		= sizeof(struct rxe_mca),
-		.elem_offset	= offsetof(struct rxe_mca, elem),
-	},
 };
 
 static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min)
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 214279310f4d..511f81554fd1 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -23,7 +23,6 @@ enum rxe_elem_type {
 	RXE_TYPE_MR,
 	RXE_TYPE_MW,
 	RXE_TYPE_MC_GRP,
-	RXE_TYPE_MC_ELEM,
 	RXE_NUM_TYPES,		/* keep me last */
 };
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 55f8ed2bc621..02745d51c163 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -363,7 +363,6 @@ struct rxe_mcg {
 };
 
 struct rxe_mca {
-	struct rxe_pool_elem	elem;
 	struct list_head	qp_list;
 	struct rxe_qp		*qp;
 };
@@ -397,7 +396,6 @@ struct rxe_dev {
 	struct rxe_pool		mr_pool;
 	struct rxe_pool		mw_pool;
 	struct rxe_pool		mc_grp_pool;
-	struct rxe_pool		mc_elem_pool;
 
 	spinlock_t		pending_lock; /* guard pending_mmaps */
 	struct list_head	pending_mmaps;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 08/26] RDMA/rxe: Rename grp to mcg and mce to mca
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (6 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 07/26] RDMA/rxe: Use kzmalloc/kfree for mca Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 09/26] RDMA/rxe: Introduce RXECB(skb) Bob Pearson
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

In rxe_mcast.c and rxe_recv.c replace 'grp' by 'mcg' and 'mce' by 'mca'.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 102 +++++++++++++-------------
 drivers/infiniband/sw/rxe/rxe_recv.c  |   8 +-
 2 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 39f38ee665f2..ed1b9ca65da3 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -31,33 +31,33 @@ static struct rxe_mcg *create_grp(struct rxe_dev *rxe,
 				     union ib_gid *mgid)
 {
 	int err;
-	struct rxe_mcg *grp;
+	struct rxe_mcg *mcg;
 
-	grp = rxe_alloc_locked(&rxe->mc_grp_pool);
-	if (!grp)
+	mcg = rxe_alloc_locked(&rxe->mc_grp_pool);
+	if (!mcg)
 		return ERR_PTR(-ENOMEM);
-	rxe_add_ref(grp);
+	rxe_add_ref(mcg);
 
-	INIT_LIST_HEAD(&grp->qp_list);
-	spin_lock_init(&grp->mcg_lock);
-	grp->rxe = rxe;
-	rxe_add_key_locked(grp, mgid);
+	INIT_LIST_HEAD(&mcg->qp_list);
+	spin_lock_init(&mcg->mcg_lock);
+	mcg->rxe = rxe;
+	rxe_add_key_locked(mcg, mgid);
 
 	err = rxe_mcast_add(rxe, mgid);
 	if (unlikely(err)) {
-		rxe_drop_key_locked(grp);
-		rxe_drop_ref(grp);
+		rxe_drop_key_locked(mcg);
+		rxe_drop_ref(mcg);
 		return ERR_PTR(err);
 	}
 
-	return grp;
+	return mcg;
 }
 
 static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
-			     struct rxe_mcg **grp_p)
+			     struct rxe_mcg **mcgp)
 {
 	int err;
-	struct rxe_mcg *grp;
+	struct rxe_mcg *mcg;
 	struct rxe_pool *pool = &rxe->mc_grp_pool;
 
 	if (rxe->attr.max_mcast_qp_attach == 0)
@@ -65,47 +65,47 @@ static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 
 	write_lock_bh(&pool->pool_lock);
 
-	grp = rxe_pool_get_key_locked(pool, mgid);
-	if (grp)
+	mcg = rxe_pool_get_key_locked(pool, mgid);
+	if (mcg)
 		goto done;
 
-	grp = create_grp(rxe, pool, mgid);
-	if (IS_ERR(grp)) {
+	mcg = create_grp(rxe, pool, mgid);
+	if (IS_ERR(mcg)) {
 		write_unlock_bh(&pool->pool_lock);
-		err = PTR_ERR(grp);
+		err = PTR_ERR(mcg);
 		return err;
 	}
 
 done:
 	write_unlock_bh(&pool->pool_lock);
-	*grp_p = grp;
+	*mcgp = mcg;
 	return 0;
 }
 
 static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-			   struct rxe_mcg *grp)
+			   struct rxe_mcg *mcg)
 {
 	int err;
 	struct rxe_mca *mca, *new_mca;
 
 	/* check to see if the qp is already a member of the group */
-	spin_lock_bh(&grp->mcg_lock);
-	list_for_each_entry(mca, &grp->qp_list, qp_list) {
+	spin_lock_bh(&mcg->mcg_lock);
+	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
-			spin_unlock_bh(&grp->mcg_lock);
+			spin_unlock_bh(&mcg->mcg_lock);
 			return 0;
 		}
 	}
-	spin_unlock_bh(&grp->mcg_lock);
+	spin_unlock_bh(&mcg->mcg_lock);
 
 	/* speculative alloc new mca without using GFP_ATOMIC */
 	new_mca = kzalloc(sizeof(*mca), GFP_KERNEL);
 	if (!new_mca)
 		return -ENOMEM;
 
-	spin_lock_bh(&grp->mcg_lock);
+	spin_lock_bh(&mcg->mcg_lock);
 	/* re-check to see if someone else just attached qp */
-	list_for_each_entry(mca, &grp->qp_list, qp_list) {
+	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
 			kfree(new_mca);
 			err = 0;
@@ -113,63 +113,63 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 		}
 	}
 
-	if (grp->num_qp >= rxe->attr.max_mcast_qp_attach) {
+	if (mcg->num_qp >= rxe->attr.max_mcast_qp_attach) {
 		err = -ENOMEM;
 		goto out;
 	}
 
-	grp->num_qp++;
+	mcg->num_qp++;
 	new_mca->qp = qp;
 	atomic_inc(&qp->mcg_num);
 
-	list_add(&new_mca->qp_list, &grp->qp_list);
+	list_add(&new_mca->qp_list, &mcg->qp_list);
 
 	err = 0;
 out:
-	spin_unlock_bh(&grp->mcg_lock);
+	spin_unlock_bh(&mcg->mcg_lock);
 	return err;
 }
 
 static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 				   union ib_gid *mgid)
 {
-	struct rxe_mcg *grp;
+	struct rxe_mcg *mcg;
 	struct rxe_mca *mca, *tmp;
 
-	grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
-	if (!grp)
+	mcg = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
+	if (!mcg)
 		goto err1;
 
-	spin_lock_bh(&grp->mcg_lock);
+	spin_lock_bh(&mcg->mcg_lock);
 
-	list_for_each_entry_safe(mca, tmp, &grp->qp_list, qp_list) {
+	list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
 			list_del(&mca->qp_list);
-			grp->num_qp--;
-			if (grp->num_qp <= 0)
-				rxe_drop_ref(grp);
+			mcg->num_qp--;
+			if (mcg->num_qp <= 0)
+				rxe_drop_ref(mcg);
 			atomic_dec(&qp->mcg_num);
 
-			spin_unlock_bh(&grp->mcg_lock);
-			rxe_drop_ref(grp);	/* ref from get_key */
+			spin_unlock_bh(&mcg->mcg_lock);
+			rxe_drop_ref(mcg);	/* ref from get_key */
 			kfree(mca);
 			return 0;
 		}
 	}
 
-	spin_unlock_bh(&grp->mcg_lock);
-	rxe_drop_ref(grp);			/* ref from get_key */
+	spin_unlock_bh(&mcg->mcg_lock);
+	rxe_drop_ref(mcg);			/* ref from get_key */
 err1:
 	return -EINVAL;
 }
 
 void rxe_mc_cleanup(struct rxe_pool_elem *elem)
 {
-	struct rxe_mcg *grp = container_of(elem, typeof(*grp), elem);
-	struct rxe_dev *rxe = grp->rxe;
+	struct rxe_mcg *mcg = container_of(elem, typeof(*mcg), elem);
+	struct rxe_dev *rxe = mcg->rxe;
 
-	rxe_drop_key(grp);
-	rxe_mcast_delete(rxe, &grp->mgid);
+	rxe_drop_key(mcg);
+	rxe_mcast_delete(rxe, &mcg->mgid);
 }
 
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
@@ -177,16 +177,16 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	int err;
 	struct rxe_dev *rxe = to_rdev(ibqp->device);
 	struct rxe_qp *qp = to_rqp(ibqp);
-	struct rxe_mcg *grp;
+	struct rxe_mcg *mcg;
 
-	/* takes a ref on grp if successful */
-	err = rxe_mcast_get_grp(rxe, mgid, &grp);
+	/* takes a ref on mcg if successful */
+	err = rxe_mcast_get_grp(rxe, mgid, &mcg);
 	if (err)
 		return err;
 
-	err = rxe_mcast_add_grp_elem(rxe, qp, grp);
+	err = rxe_mcast_add_grp_elem(rxe, qp, mcg);
 
-	rxe_drop_ref(grp);
+	rxe_drop_ref(mcg);
 	return err;
 }
 
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 7ff6b53555f4..814a002b8911 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -234,7 +234,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 {
 	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
 	struct rxe_mcg *mcg;
-	struct rxe_mca *mce;
+	struct rxe_mca *mca;
 	struct rxe_qp *qp;
 	union ib_gid dgid;
 	int err;
@@ -257,8 +257,8 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 	 * single QP happen and just move on and try
 	 * the rest of them on the list
 	 */
-	list_for_each_entry(mce, &mcg->qp_list, qp_list) {
-		qp = mce->qp;
+	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
+		qp = mca->qp;
 
 		/* validate qp for incoming packet */
 		err = check_type_state(rxe, pkt, qp);
@@ -273,7 +273,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 		 * skb and pass to the QP. Pass the original skb to
 		 * the last QP in the list.
 		 */
-		if (mce->qp_list.next != &mcg->qp_list) {
+		if (mca->qp_list.next != &mcg->qp_list) {
 			struct sk_buff *cskb;
 			struct rxe_pkt_info *cpkt;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 09/26] RDMA/rxe: Introduce RXECB(skb)
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (7 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 08/26] RDMA/rxe: Rename grp to mcg and mce to mca Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-28 18:29   ` Jason Gunthorpe
  2022-01-27 21:37 ` [RFC PATCH v9 10/26] RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases Bob Pearson
                   ` (17 subsequent siblings)
  26 siblings, 1 reply; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Add a #define RXECB(skb) to rxe_hdr.h as a short cut to
refer to single members of rxe_pkt_info which is stored in skb->cb
in the receive path. Use this to make some cleanups in rxe_recv.c

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_hdr.h  |  3 ++
 drivers/infiniband/sw/rxe/rxe_recv.c | 55 +++++++++++++---------------
 2 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_hdr.h b/drivers/infiniband/sw/rxe/rxe_hdr.h
index e432f9e37795..2a85d1e40e6a 100644
--- a/drivers/infiniband/sw/rxe/rxe_hdr.h
+++ b/drivers/infiniband/sw/rxe/rxe_hdr.h
@@ -36,6 +36,9 @@ static inline struct sk_buff *PKT_TO_SKB(struct rxe_pkt_info *pkt)
 	return container_of((void *)pkt, struct sk_buff, cb);
 }
 
+/* alternative to access a single element of rxe_pkt_info from skb */
+#define RXECB(skb) ((struct rxe_pkt_info *)((skb)->cb))
+
 /*
  * IBA header types and methods
  *
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 814a002b8911..10020103ea4a 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -107,17 +107,15 @@ static int check_keys(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
 	return -EINVAL;
 }
 
-static int check_addr(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
+static int check_addr(struct rxe_dev *rxe, struct sk_buff *skb,
 		      struct rxe_qp *qp)
 {
-	struct sk_buff *skb = PKT_TO_SKB(pkt);
-
 	if (qp_type(qp) != IB_QPT_RC && qp_type(qp) != IB_QPT_UC)
 		goto done;
 
-	if (unlikely(pkt->port_num != qp->attr.port_num)) {
+	if (unlikely(RXECB(skb)->port_num != qp->attr.port_num)) {
 		pr_warn_ratelimited("port %d != qp port %d\n",
-				    pkt->port_num, qp->attr.port_num);
+				    RXECB(skb)->port_num, qp->attr.port_num);
 		goto err1;
 	}
 
@@ -167,8 +165,9 @@ static int check_addr(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
 	return -EINVAL;
 }
 
-static int hdr_check(struct rxe_pkt_info *pkt)
+static int hdr_check(struct sk_buff *skb)
 {
+	struct rxe_pkt_info *pkt = RXECB(skb);
 	struct rxe_dev *rxe = pkt->rxe;
 	struct rxe_port *port = &rxe->port;
 	struct rxe_qp *qp = NULL;
@@ -199,7 +198,7 @@ static int hdr_check(struct rxe_pkt_info *pkt)
 		if (unlikely(err))
 			goto err2;
 
-		err = check_addr(rxe, pkt, qp);
+		err = check_addr(rxe, skb, qp);
 		if (unlikely(err))
 			goto err2;
 
@@ -222,17 +221,19 @@ static int hdr_check(struct rxe_pkt_info *pkt)
 	return -EINVAL;
 }
 
-static inline void rxe_rcv_pkt(struct rxe_pkt_info *pkt, struct sk_buff *skb)
+static inline void rxe_rcv_pkt(struct sk_buff *skb)
 {
-	if (pkt->mask & RXE_REQ_MASK)
-		rxe_resp_queue_pkt(pkt->qp, skb);
+	if (RXECB(skb)->mask & RXE_REQ_MASK)
+		rxe_resp_queue_pkt(RXECB(skb)->qp, skb);
 	else
-		rxe_comp_queue_pkt(pkt->qp, skb);
+		rxe_comp_queue_pkt(RXECB(skb)->qp, skb);
 }
 
-static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
+static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 {
-	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+	struct sk_buff *s;
+	struct rxe_pkt_info *pkt = RXECB(skb);
+	struct rxe_dev *rxe = pkt->rxe;
 	struct rxe_mcg *mcg;
 	struct rxe_mca *mca;
 	struct rxe_qp *qp;
@@ -274,26 +275,22 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 		 * the last QP in the list.
 		 */
 		if (mca->qp_list.next != &mcg->qp_list) {
-			struct sk_buff *cskb;
-			struct rxe_pkt_info *cpkt;
-
-			cskb = skb_clone(skb, GFP_ATOMIC);
-			if (unlikely(!cskb))
+			s = skb_clone(skb, GFP_ATOMIC);
+			if (unlikely(!s))
 				continue;
 
 			if (WARN_ON(!ib_device_try_get(&rxe->ib_dev))) {
-				kfree_skb(cskb);
+				kfree_skb(s);
 				break;
 			}
 
-			cpkt = SKB_TO_PKT(cskb);
-			cpkt->qp = qp;
+			RXECB(s)->qp = qp;
 			rxe_add_ref(qp);
-			rxe_rcv_pkt(cpkt, cskb);
+			rxe_rcv_pkt(s);
 		} else {
-			pkt->qp = qp;
+			RXECB(skb)->qp = qp;
 			rxe_add_ref(qp);
-			rxe_rcv_pkt(pkt, skb);
+			rxe_rcv_pkt(skb);
 			skb = NULL;	/* mark consumed */
 		}
 	}
@@ -326,7 +323,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
  */
 static int rxe_chk_dgid(struct rxe_dev *rxe, struct sk_buff *skb)
 {
-	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+	struct rxe_pkt_info *pkt = RXECB(skb);
 	const struct ib_gid_attr *gid_attr;
 	union ib_gid dgid;
 	union ib_gid *pdgid;
@@ -359,7 +356,7 @@ static int rxe_chk_dgid(struct rxe_dev *rxe, struct sk_buff *skb)
 void rxe_rcv(struct sk_buff *skb)
 {
 	int err;
-	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+	struct rxe_pkt_info *pkt = RXECB(skb);
 	struct rxe_dev *rxe = pkt->rxe;
 
 	if (unlikely(skb->len < RXE_BTH_BYTES))
@@ -378,7 +375,7 @@ void rxe_rcv(struct sk_buff *skb)
 	if (unlikely(skb->len < header_size(pkt)))
 		goto drop;
 
-	err = hdr_check(pkt);
+	err = hdr_check(skb);
 	if (unlikely(err))
 		goto drop;
 
@@ -389,9 +386,9 @@ void rxe_rcv(struct sk_buff *skb)
 	rxe_counter_inc(rxe, RXE_CNT_RCVD_PKTS);
 
 	if (unlikely(bth_qpn(pkt) == IB_MULTICAST_QPN))
-		rxe_rcv_mcast_pkt(rxe, skb);
+		rxe_rcv_mcast_pkt(skb);
 	else
-		rxe_rcv_pkt(pkt, skb);
+		rxe_rcv_pkt(skb);
 
 	return;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 10/26] RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (8 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 09/26] RDMA/rxe: Introduce RXECB(skb) Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 11/26] RDMA/rxe: Replace locks by rxe->mcg_lock Bob Pearson
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently rxe_rcv_mcast_pkt performs most of its work under the
mcg->mcg_lock and calls into rxe_rcv which queues the packets
to the responder and completer tasklets holding the lock which is
a very bad idea. This patch walks the qp_list in mcg and copies the
qp addresses to a dynamically allocated array under the lock but
does the rest of the work without holding the lock. The critical
section is now very small.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 11 +++++----
 drivers/infiniband/sw/rxe/rxe_recv.c  | 33 +++++++++++++++++++++++----
 drivers/infiniband/sw/rxe/rxe_verbs.h |  2 +-
 3 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index ed1b9ca65da3..3b66019fc26d 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -113,16 +113,16 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 		}
 	}
 
-	if (mcg->num_qp >= rxe->attr.max_mcast_qp_attach) {
+	if (atomic_read(&mcg->qp_num) >= rxe->attr.max_mcast_qp_attach) {
 		err = -ENOMEM;
 		goto out;
 	}
 
-	mcg->num_qp++;
+	atomic_inc(&mcg->qp_num);
 	new_mca->qp = qp;
 	atomic_inc(&qp->mcg_num);
 
-	list_add(&new_mca->qp_list, &mcg->qp_list);
+	list_add_tail(&new_mca->qp_list, &mcg->qp_list);
 
 	err = 0;
 out:
@@ -135,6 +135,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 {
 	struct rxe_mcg *mcg;
 	struct rxe_mca *mca, *tmp;
+	int n;
 
 	mcg = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
 	if (!mcg)
@@ -145,8 +146,8 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
 			list_del(&mca->qp_list);
-			mcg->num_qp--;
-			if (mcg->num_qp <= 0)
+			n = atomic_dec_return(&mcg->qp_num);
+			if (n <= 0)
 				rxe_drop_ref(mcg);
 			atomic_dec(&qp->mcg_num);
 
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 10020103ea4a..41571c6b7d98 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -229,6 +229,11 @@ static inline void rxe_rcv_pkt(struct sk_buff *skb)
 		rxe_comp_queue_pkt(RXECB(skb)->qp, skb);
 }
 
+/* split processing of the qp list into two stages.
+ * first just make a simple linear array from the
+ * current list while holding the lock and then
+ * process each qp without holding the lock.
+ */
 static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 {
 	struct sk_buff *s;
@@ -237,7 +242,9 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 	struct rxe_mcg *mcg;
 	struct rxe_mca *mca;
 	struct rxe_qp *qp;
+	struct rxe_qp **qp_array;
 	union ib_gid dgid;
+	int n, nmax;
 	int err;
 
 	if (skb->protocol == htons(ETH_P_IP))
@@ -251,15 +258,31 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 	if (!mcg)
 		goto drop;	/* mcast group not registered */
 
+	/* this is the current number of qp's attached to mcg plus a
+	 * little room in case new qp's are attached. It isn't wrong
+	 * to miss some qp's since it is just a matter of precisely
+	 * when the packet is assumed to be received.
+	 */
+	nmax = atomic_read(&mcg->qp_num) + 2;
+	qp_array = kmalloc_array(nmax, sizeof(qp), GFP_KERNEL);
+
+	n = 0;
 	spin_lock_bh(&mcg->mcg_lock);
+	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
+		qp_array[n++] = mca->qp;
+		if (n == nmax)
+			break;
+	}
+	spin_unlock_bh(&mcg->mcg_lock);
+	nmax = n;
 
 	/* this is unreliable datagram service so we let
 	 * failures to deliver a multicast packet to a
 	 * single QP happen and just move on and try
 	 * the rest of them on the list
 	 */
-	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
-		qp = mca->qp;
+	for (n = 0; n < nmax; n++) {
+		qp = qp_array[n];
 
 		/* validate qp for incoming packet */
 		err = check_type_state(rxe, pkt, qp);
@@ -274,8 +297,8 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 		 * skb and pass to the QP. Pass the original skb to
 		 * the last QP in the list.
 		 */
-		if (mca->qp_list.next != &mcg->qp_list) {
-			s = skb_clone(skb, GFP_ATOMIC);
+		if (n < nmax - 1) {
+			s = skb_clone(skb, GFP_KERNEL);
 			if (unlikely(!s))
 				continue;
 
@@ -295,7 +318,7 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 		}
 	}
 
-	spin_unlock_bh(&mcg->mcg_lock);
+	kfree(qp_array);
 
 	rxe_drop_ref(mcg);	/* drop ref from rxe_pool_get_key. */
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 02745d51c163..d65c358798c6 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -356,8 +356,8 @@ struct rxe_mcg {
 	spinlock_t		mcg_lock; /* guard group */
 	struct rxe_dev		*rxe;
 	struct list_head	qp_list;
+	atomic_t		qp_num;
 	union ib_gid		mgid;
-	int			num_qp;
 	u32			qkey;
 	u16			pkey;
 };
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 11/26] RDMA/rxe: Replace locks by rxe->mcg_lock
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (9 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 10/26] RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 12/26] RDMA/rxe: Replace pool key by rxe->mcg_tree Bob Pearson
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Starting to decouple mcg from rxe pools, replace the spin lock
mcg->mcg_lock and the write lock pool->pool_lock by rxe->mcg_lock.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |  2 ++
 drivers/infiniband/sw/rxe/rxe_mcast.c | 25 ++++++++++++-------------
 drivers/infiniband/sw/rxe/rxe_recv.c  |  4 ++--
 drivers/infiniband/sw/rxe/rxe_verbs.h |  3 ++-
 4 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index c55736e441e7..46a07e2d9dcf 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -198,6 +198,8 @@ static int rxe_init(struct rxe_dev *rxe)
 	if (err)
 		return err;
 
+	spin_lock_init(&rxe->mcg_lock);
+
 	/* init pending mmap list */
 	spin_lock_init(&rxe->mmap_offset_lock);
 	spin_lock_init(&rxe->pending_lock);
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 3b66019fc26d..62ace10206b0 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -25,7 +25,7 @@ static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
 	return dev_mc_del(rxe->ndev, ll_addr);
 }
 
-/* caller should hold mc_grp_pool->pool_lock */
+/* caller should hold mc_grp_rxe->mcg_lock */
 static struct rxe_mcg *create_grp(struct rxe_dev *rxe,
 				     struct rxe_pool *pool,
 				     union ib_gid *mgid)
@@ -39,7 +39,6 @@ static struct rxe_mcg *create_grp(struct rxe_dev *rxe,
 	rxe_add_ref(mcg);
 
 	INIT_LIST_HEAD(&mcg->qp_list);
-	spin_lock_init(&mcg->mcg_lock);
 	mcg->rxe = rxe;
 	rxe_add_key_locked(mcg, mgid);
 
@@ -63,7 +62,7 @@ static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 	if (rxe->attr.max_mcast_qp_attach == 0)
 		return -EINVAL;
 
-	write_lock_bh(&pool->pool_lock);
+	spin_lock_bh(&rxe->mcg_lock);
 
 	mcg = rxe_pool_get_key_locked(pool, mgid);
 	if (mcg)
@@ -71,13 +70,13 @@ static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 
 	mcg = create_grp(rxe, pool, mgid);
 	if (IS_ERR(mcg)) {
-		write_unlock_bh(&pool->pool_lock);
+		spin_unlock_bh(&rxe->mcg_lock);
 		err = PTR_ERR(mcg);
 		return err;
 	}
 
 done:
-	write_unlock_bh(&pool->pool_lock);
+	spin_unlock_bh(&rxe->mcg_lock);
 	*mcgp = mcg;
 	return 0;
 }
@@ -89,21 +88,21 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	struct rxe_mca *mca, *new_mca;
 
 	/* check to see if the qp is already a member of the group */
-	spin_lock_bh(&mcg->mcg_lock);
+	spin_lock_bh(&rxe->mcg_lock);
 	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
-			spin_unlock_bh(&mcg->mcg_lock);
+			spin_unlock_bh(&rxe->mcg_lock);
 			return 0;
 		}
 	}
-	spin_unlock_bh(&mcg->mcg_lock);
+	spin_unlock_bh(&rxe->mcg_lock);
 
 	/* speculative alloc new mca without using GFP_ATOMIC */
 	new_mca = kzalloc(sizeof(*mca), GFP_KERNEL);
 	if (!new_mca)
 		return -ENOMEM;
 
-	spin_lock_bh(&mcg->mcg_lock);
+	spin_lock_bh(&rxe->mcg_lock);
 	/* re-check to see if someone else just attached qp */
 	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
@@ -126,7 +125,7 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	err = 0;
 out:
-	spin_unlock_bh(&mcg->mcg_lock);
+	spin_unlock_bh(&rxe->mcg_lock);
 	return err;
 }
 
@@ -141,7 +140,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	if (!mcg)
 		goto err1;
 
-	spin_lock_bh(&mcg->mcg_lock);
+	spin_lock_bh(&rxe->mcg_lock);
 
 	list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
@@ -151,14 +150,14 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 				rxe_drop_ref(mcg);
 			atomic_dec(&qp->mcg_num);
 
-			spin_unlock_bh(&mcg->mcg_lock);
+			spin_unlock_bh(&rxe->mcg_lock);
 			rxe_drop_ref(mcg);	/* ref from get_key */
 			kfree(mca);
 			return 0;
 		}
 	}
 
-	spin_unlock_bh(&mcg->mcg_lock);
+	spin_unlock_bh(&rxe->mcg_lock);
 	rxe_drop_ref(mcg);			/* ref from get_key */
 err1:
 	return -EINVAL;
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 41571c6b7d98..11246589fda7 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -267,13 +267,13 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 	qp_array = kmalloc_array(nmax, sizeof(qp), GFP_KERNEL);
 
 	n = 0;
-	spin_lock_bh(&mcg->mcg_lock);
+	spin_lock_bh(&rxe->mcg_lock);
 	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
 		qp_array[n++] = mca->qp;
 		if (n == nmax)
 			break;
 	}
-	spin_unlock_bh(&mcg->mcg_lock);
+	spin_unlock_bh(&rxe->mcg_lock);
 	nmax = n;
 
 	/* this is unreliable datagram service so we let
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index d65c358798c6..b72f8f09d984 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -353,7 +353,6 @@ struct rxe_mw {
 
 struct rxe_mcg {
 	struct rxe_pool_elem	elem;
-	spinlock_t		mcg_lock; /* guard group */
 	struct rxe_dev		*rxe;
 	struct list_head	qp_list;
 	atomic_t		qp_num;
@@ -397,6 +396,8 @@ struct rxe_dev {
 	struct rxe_pool		mw_pool;
 	struct rxe_pool		mc_grp_pool;
 
+	spinlock_t		mcg_lock; /* guard multicast groups */
+
 	spinlock_t		pending_lock; /* guard pending_mmaps */
 	struct list_head	pending_mmaps;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 12/26] RDMA/rxe: Replace pool key by rxe->mcg_tree
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (10 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 11/26] RDMA/rxe: Replace locks by rxe->mcg_lock Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-28 18:32   ` Jason Gunthorpe
  2022-01-27 21:37 ` [RFC PATCH v9 13/26] RDMA/rxe: Remove key'ed object support Bob Pearson
                   ` (14 subsequent siblings)
  26 siblings, 1 reply; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Continuing to decouple mcg from rxe pools. Create red-black tree code
in rxe_mcast.c to hold mcg index.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |   1 +
 drivers/infiniband/sw/rxe/rxe_loc.h   |   3 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c | 187 +++++++++++++++++++++-----
 drivers/infiniband/sw/rxe/rxe_recv.c  |   4 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h |   3 +
 5 files changed, 159 insertions(+), 39 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 46a07e2d9dcf..310e184ae9e8 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -199,6 +199,7 @@ static int rxe_init(struct rxe_dev *rxe)
 		return err;
 
 	spin_lock_init(&rxe->mcg_lock);
+	rxe->mcg_tree = RB_ROOT;
 
 	/* init pending mmap list */
 	spin_lock_init(&rxe->mmap_offset_lock);
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index af40e3c212fb..d9faf3a1ee61 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -40,9 +40,10 @@ void rxe_cq_disable(struct rxe_cq *cq);
 void rxe_cq_cleanup(struct rxe_pool_elem *arg);
 
 /* rxe_mcast.c */
-void rxe_mc_cleanup(struct rxe_pool_elem *arg);
+struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid);
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
+void rxe_mc_cleanup(struct rxe_pool_elem *arg);
 
 /* rxe_mmap.c */
 struct rxe_mmap_info {
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 62ace10206b0..4c3eb9c723b4 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -25,60 +25,172 @@ static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
 	return dev_mc_del(rxe->ndev, ll_addr);
 }
 
-/* caller should hold mc_grp_rxe->mcg_lock */
-static struct rxe_mcg *create_grp(struct rxe_dev *rxe,
-				     struct rxe_pool *pool,
-				     union ib_gid *mgid)
+/**
+ * __rxe_insert_mcg - insert an mcg into red-black tree (rxe->mcg_tree)
+ * @mcg: mcast group object with an embedded red-black tree node
+ *
+ * Context: caller must hold a reference to mcg and rxe->mcg_lock and
+ * is responsible to avoid adding the same mcg twice to the tree.
+ */
+static void __rxe_insert_mcg(struct rxe_mcg *mcg)
 {
-	int err;
+	struct rb_root *tree = &mcg->rxe->mcg_tree;
+	struct rb_node **link = &tree->rb_node;
+	struct rb_node *node = NULL;
+	struct rxe_mcg *tmp;
+	int cmp;
+
+	while (*link) {
+		node = *link;
+		tmp = rb_entry(node, struct rxe_mcg, node);
+
+		cmp = memcmp(&tmp->mgid, &mcg->mgid, sizeof(mcg->mgid));
+		if (cmp > 0)
+			link = &(*link)->rb_left;
+		else
+			link = &(*link)->rb_right;
+	}
+
+	rb_link_node(&mcg->node, node, link);
+	rb_insert_color(&mcg->node, tree);
+}
+
+/**
+ * __rxe_remove_mcg - remove an mcg from red-black tree holding lock
+ * @mcg: mcast group object with an embedded red-black tree node
+ *
+ * Context: caller must hold a reference to mcg and rxe->mcg_lock
+ */
+static void __rxe_remove_mcg(struct rxe_mcg *mcg)
+{
+	rb_erase(&mcg->node, &mcg->rxe->mcg_tree);
+}
+
+/**
+ * __rxe_lookup_mcg - lookup mcg in rxe->mcg_tree while holding lock
+ * @rxe: rxe device object
+ * @mgid: multicast IP address
+ *
+ * Context: caller must hold rxe->mcg_lock
+ * Returns: mcg on success and takes a ref to mcg else NULL
+ */
+static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe,
+					union ib_gid *mgid)
+{
+	struct rb_root *tree = &rxe->mcg_tree;
 	struct rxe_mcg *mcg;
+	struct rb_node *node;
+	int cmp;
 
-	mcg = rxe_alloc_locked(&rxe->mc_grp_pool);
-	if (!mcg)
-		return ERR_PTR(-ENOMEM);
-	rxe_add_ref(mcg);
+	node = tree->rb_node;
 
-	INIT_LIST_HEAD(&mcg->qp_list);
-	mcg->rxe = rxe;
-	rxe_add_key_locked(mcg, mgid);
+	while (node) {
+		mcg = rb_entry(node, struct rxe_mcg, node);
 
-	err = rxe_mcast_add(rxe, mgid);
-	if (unlikely(err)) {
-		rxe_drop_key_locked(mcg);
-		rxe_drop_ref(mcg);
-		return ERR_PTR(err);
+		cmp = memcmp(&mcg->mgid, mgid, sizeof(*mgid));
+
+		if (cmp > 0)
+			node = node->rb_left;
+		else if (cmp < 0)
+			node = node->rb_right;
+		else
+			break;
 	}
 
-	return mcg;
+	if (node) {
+		rxe_add_ref(mcg);
+		return mcg;
+	}
+
+	return NULL;
 }
 
-static int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
-			     struct rxe_mcg **mcgp)
+/**
+ * rxe_lookup_mcg - lookup up mcg in red-back tree
+ * @rxe: rxe device object
+ * @mgid: multicast IP address
+ *
+ * Returns: mcg if found else NULL
+ */
+struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
 {
-	int err;
 	struct rxe_mcg *mcg;
+
+	spin_lock_bh(&rxe->mcg_lock);
+	mcg = __rxe_lookup_mcg(rxe, mgid);
+	spin_unlock_bh(&rxe->mcg_lock);
+
+	return mcg;
+}
+
+/**
+ * rxe_get_mcg - lookup or allocate a mcg
+ * @rxe: rxe device object
+ * @mgid: multicast IP address
+ * @mcgp: address of returned mcg value
+ *
+ * Adds one ref if mcg already exists else add a second reference
+ * which is dropped when qp_num goes to zero.
+ *
+ * Returns: 0 and sets *mcgp to mcg on success else an error
+ */
+static int rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
+		       struct rxe_mcg **mcgp)
+{
+	struct rxe_mcg *mcg, *tmp;
+	int ret;
 	struct rxe_pool *pool = &rxe->mc_grp_pool;
 
-	if (rxe->attr.max_mcast_qp_attach == 0)
+	if (rxe->attr.max_mcast_grp == 0)
 		return -EINVAL;
 
-	spin_lock_bh(&rxe->mcg_lock);
+	/* check to see if mcg already exists */
+	mcg = rxe_lookup_mcg(rxe, mgid);
+	if (mcg) {
+		*mcgp = mcg;
+		return 0;
+	}
 
-	mcg = rxe_pool_get_key_locked(pool, mgid);
-	if (mcg)
-		goto done;
+	/* speculative alloc of mcg without using GFP_ATOMIC */
+	mcg = rxe_alloc(pool);
+	if (!mcg)
+		return -ENOMEM;
 
-	mcg = create_grp(rxe, pool, mgid);
-	if (IS_ERR(mcg)) {
+	spin_lock_bh(&rxe->mcg_lock);
+	/* re-check to see if someone else just added it */
+	tmp = __rxe_lookup_mcg(rxe, mgid);
+	if (tmp) {
 		spin_unlock_bh(&rxe->mcg_lock);
-		err = PTR_ERR(mcg);
-		return err;
+		rxe_drop_ref(mcg);
+		mcg = tmp;
+		goto out;
 	}
 
-done:
+	if (atomic_inc_return(&rxe->mcg_num) > rxe->attr.max_mcast_grp)
+		goto err_dec;
+
+	ret = rxe_mcast_add(rxe, mgid);
+	if (ret)
+		goto err_out;
+
+	rxe_add_ref(mcg);
+	mcg->rxe = rxe;
+	memcpy(&mcg->mgid, mgid, sizeof(*mgid));
+	INIT_LIST_HEAD(&mcg->qp_list);
+	atomic_inc(&rxe->mcg_num);
+	__rxe_insert_mcg(mcg);
 	spin_unlock_bh(&rxe->mcg_lock);
+out:
 	*mcgp = mcg;
 	return 0;
+
+err_dec:
+	atomic_dec(&rxe->mcg_num);
+	ret = -ENOMEM;
+err_out:
+	spin_unlock_bh(&rxe->mcg_lock);
+	rxe_drop_ref(mcg);
+	return ret;
 }
 
 static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
@@ -136,7 +248,7 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	struct rxe_mca *mca, *tmp;
 	int n;
 
-	mcg = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
+	mcg = rxe_lookup_mcg(rxe, mgid);
 	if (!mcg)
 		goto err1;
 
@@ -151,14 +263,14 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			atomic_dec(&qp->mcg_num);
 
 			spin_unlock_bh(&rxe->mcg_lock);
-			rxe_drop_ref(mcg);	/* ref from get_key */
+			rxe_drop_ref(mcg);
 			kfree(mca);
 			return 0;
 		}
 	}
 
 	spin_unlock_bh(&rxe->mcg_lock);
-	rxe_drop_ref(mcg);			/* ref from get_key */
+	rxe_drop_ref(mcg);
 err1:
 	return -EINVAL;
 }
@@ -168,7 +280,10 @@ void rxe_mc_cleanup(struct rxe_pool_elem *elem)
 	struct rxe_mcg *mcg = container_of(elem, typeof(*mcg), elem);
 	struct rxe_dev *rxe = mcg->rxe;
 
-	rxe_drop_key(mcg);
+	spin_lock_bh(&rxe->mcg_lock);
+	__rxe_remove_mcg(mcg);
+	spin_unlock_bh(&rxe->mcg_lock);
+
 	rxe_mcast_delete(rxe, &mcg->mgid);
 }
 
@@ -180,7 +295,7 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	struct rxe_mcg *mcg;
 
 	/* takes a ref on mcg if successful */
-	err = rxe_mcast_get_grp(rxe, mgid, &mcg);
+	err = rxe_get_mcg(rxe, mgid, &mcg);
 	if (err)
 		return err;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 11246589fda7..f1ca83e09160 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -254,7 +254,7 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 		memcpy(&dgid, &ipv6_hdr(skb)->daddr, sizeof(dgid));
 
 	/* lookup mcast group corresponding to mgid, takes a ref */
-	mcg = rxe_pool_get_key(&rxe->mc_grp_pool, &dgid);
+	mcg = rxe_lookup_mcg(rxe, &dgid);
 	if (!mcg)
 		goto drop;	/* mcast group not registered */
 
@@ -320,7 +320,7 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 
 	kfree(qp_array);
 
-	rxe_drop_ref(mcg);	/* drop ref from rxe_pool_get_key. */
+	rxe_drop_ref(mcg);
 
 	if (likely(!skb))
 		return;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index b72f8f09d984..ea2d9ff29744 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -353,6 +353,7 @@ struct rxe_mw {
 
 struct rxe_mcg {
 	struct rxe_pool_elem	elem;
+	struct rb_node		node;
 	struct rxe_dev		*rxe;
 	struct list_head	qp_list;
 	atomic_t		qp_num;
@@ -397,6 +398,8 @@ struct rxe_dev {
 	struct rxe_pool		mc_grp_pool;
 
 	spinlock_t		mcg_lock; /* guard multicast groups */
+	struct rb_root		mcg_tree;
+	atomic_t		mcg_num;
 
 	spinlock_t		pending_lock; /* guard pending_mmaps */
 	struct list_head	pending_mmaps;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 13/26] RDMA/rxe: Remove key'ed object support
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (11 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 12/26] RDMA/rxe: Replace pool key by rxe->mcg_tree Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 14/26] RDMA/rxe: Remove mcg from rxe pools Bob Pearson
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Now that rxe_mcast.c has it's own red-black tree support there is no
longer any requirement for key'ed objects in rxe pools. This patch
removes the key APIs and related code.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 126 ---------------------------
 drivers/infiniband/sw/rxe/rxe_pool.h |  38 --------
 2 files changed, 164 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index a6756aa93e2b..673b29f1f12c 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -16,8 +16,6 @@ static const struct rxe_type_info {
 	enum rxe_pool_flags flags;
 	u32 min_index;
 	u32 max_index;
-	size_t key_offset;
-	size_t key_size;
 } rxe_type_info[RXE_NUM_TYPES] = {
 	[RXE_TYPE_UC] = {
 		.name		= "rxe-uc",
@@ -86,9 +84,6 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_mcg),
 		.elem_offset	= offsetof(struct rxe_mcg, elem),
 		.cleanup	= rxe_mc_cleanup,
-		.flags		= RXE_POOL_KEY,
-		.key_offset	= offsetof(struct rxe_mcg, mgid),
-		.key_size	= sizeof(union ib_gid),
 	},
 };
 
@@ -147,12 +142,6 @@ int rxe_pool_init(
 			goto out;
 	}
 
-	if (pool->flags & RXE_POOL_KEY) {
-		pool->key.tree = RB_ROOT;
-		pool->key.key_offset = info->key_offset;
-		pool->key.key_size = info->key_size;
-	}
-
 out:
 	return err;
 }
@@ -209,77 +198,6 @@ static int rxe_insert_index(struct rxe_pool *pool, struct rxe_pool_elem *new)
 	return 0;
 }
 
-static int rxe_insert_key(struct rxe_pool *pool, struct rxe_pool_elem *new)
-{
-	struct rb_node **link = &pool->key.tree.rb_node;
-	struct rb_node *parent = NULL;
-	struct rxe_pool_elem *elem;
-	int cmp;
-
-	while (*link) {
-		parent = *link;
-		elem = rb_entry(parent, struct rxe_pool_elem, key_node);
-
-		cmp = memcmp((u8 *)elem + pool->key.key_offset,
-			     (u8 *)new + pool->key.key_offset,
-			     pool->key.key_size);
-
-		if (cmp == 0) {
-			pr_warn("key already exists!\n");
-			return -EINVAL;
-		}
-
-		if (cmp > 0)
-			link = &(*link)->rb_left;
-		else
-			link = &(*link)->rb_right;
-	}
-
-	rb_link_node(&new->key_node, parent, link);
-	rb_insert_color(&new->key_node, &pool->key.tree);
-
-	return 0;
-}
-
-int __rxe_add_key_locked(struct rxe_pool_elem *elem, void *key)
-{
-	struct rxe_pool *pool = elem->pool;
-	int err;
-
-	memcpy((u8 *)elem + pool->key.key_offset, key, pool->key.key_size);
-	err = rxe_insert_key(pool, elem);
-
-	return err;
-}
-
-int __rxe_add_key(struct rxe_pool_elem *elem, void *key)
-{
-	struct rxe_pool *pool = elem->pool;
-	int err;
-
-	write_lock_bh(&pool->pool_lock);
-	err = __rxe_add_key_locked(elem, key);
-	write_unlock_bh(&pool->pool_lock);
-
-	return err;
-}
-
-void __rxe_drop_key_locked(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-
-	rb_erase(&elem->key_node, &pool->key.tree);
-}
-
-void __rxe_drop_key(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-
-	write_lock_bh(&pool->pool_lock);
-	__rxe_drop_key_locked(elem);
-	write_unlock_bh(&pool->pool_lock);
-}
-
 int __rxe_add_index_locked(struct rxe_pool_elem *elem)
 {
 	struct rxe_pool *pool = elem->pool;
@@ -443,47 +361,3 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 
 	return obj;
 }
-
-void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key)
-{
-	struct rb_node *node;
-	struct rxe_pool_elem *elem;
-	void *obj;
-	int cmp;
-
-	node = pool->key.tree.rb_node;
-
-	while (node) {
-		elem = rb_entry(node, struct rxe_pool_elem, key_node);
-
-		cmp = memcmp((u8 *)elem + pool->key.key_offset,
-			     key, pool->key.key_size);
-
-		if (cmp > 0)
-			node = node->rb_left;
-		else if (cmp < 0)
-			node = node->rb_right;
-		else
-			break;
-	}
-
-	if (node) {
-		kref_get(&elem->ref_cnt);
-		obj = elem->obj;
-	} else {
-		obj = NULL;
-	}
-
-	return obj;
-}
-
-void *rxe_pool_get_key(struct rxe_pool *pool, void *key)
-{
-	void *obj;
-
-	read_lock_bh(&pool->pool_lock);
-	obj = rxe_pool_get_key_locked(pool, key);
-	read_unlock_bh(&pool->pool_lock);
-
-	return obj;
-}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 511f81554fd1..b6de415e10d2 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -9,7 +9,6 @@
 
 enum rxe_pool_flags {
 	RXE_POOL_INDEX		= BIT(1),
-	RXE_POOL_KEY		= BIT(2),
 	RXE_POOL_NO_ALLOC	= BIT(4),
 };
 
@@ -32,9 +31,6 @@ struct rxe_pool_elem {
 	struct kref		ref_cnt;
 	struct list_head	list;
 
-	/* only used if keyed */
-	struct rb_node		key_node;
-
 	/* only used if indexed */
 	struct rb_node		index_node;
 	u32			index;
@@ -61,13 +57,6 @@ struct rxe_pool {
 		u32			max_index;
 		u32			min_index;
 	} index;
-
-	/* only used if keyed */
-	struct {
-		struct rb_root		tree;
-		size_t			key_offset;
-		size_t			key_size;
-	} key;
 };
 
 /* initialize a pool of objects with given limit on
@@ -112,26 +101,6 @@ void __rxe_drop_index(struct rxe_pool_elem *elem);
 
 #define rxe_drop_index(obj) __rxe_drop_index(&(obj)->elem)
 
-/* assign a key to a keyed object and insert object into
- * pool's rb tree holding and not holding pool_lock
- */
-int __rxe_add_key_locked(struct rxe_pool_elem *elem, void *key);
-
-#define rxe_add_key_locked(obj, key) __rxe_add_key_locked(&(obj)->elem, key)
-
-int __rxe_add_key(struct rxe_pool_elem *elem, void *key);
-
-#define rxe_add_key(obj, key) __rxe_add_key(&(obj)->elem, key)
-
-/* remove elem from rb tree holding and not holding the pool_lock */
-void __rxe_drop_key_locked(struct rxe_pool_elem *elem);
-
-#define rxe_drop_key_locked(obj) __rxe_drop_key_locked(&(obj)->elem)
-
-void __rxe_drop_key(struct rxe_pool_elem *elem);
-
-#define rxe_drop_key(obj) __rxe_drop_key(&(obj)->elem)
-
 /* lookup an indexed object from index holding and not holding the pool_lock.
  * takes a reference on object
  */
@@ -139,13 +108,6 @@ void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index);
 
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index);
 
-/* lookup keyed object from key holding and not holding the pool_lock.
- * takes a reference on the objecti
- */
-void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key);
-
-void *rxe_pool_get_key(struct rxe_pool *pool, void *key);
-
 /* cleanup an object when all references are dropped */
 void rxe_elem_release(struct kref *kref);
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 14/26] RDMA/rxe: Remove mcg from rxe pools
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (12 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 13/26] RDMA/rxe: Remove key'ed object support Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 15/26] RDMA/rxe: Add code to cleanup mcast memory Bob Pearson
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Finish removing mcg from rxe pools. Replace rxe pools ref counting by
kref's. Replace rxe_alloc by kzalloc.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |  8 ---
 drivers/infiniband/sw/rxe/rxe_loc.h   |  2 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c | 76 ++++++++++++++++++---------
 drivers/infiniband/sw/rxe/rxe_pool.c  |  6 ---
 drivers/infiniband/sw/rxe/rxe_pool.h  |  1 -
 drivers/infiniband/sw/rxe/rxe_recv.c  |  4 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h |  2 +-
 7 files changed, 54 insertions(+), 45 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 310e184ae9e8..c560d467a972 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -28,7 +28,6 @@ void rxe_dealloc(struct ib_device *ib_dev)
 	rxe_pool_cleanup(&rxe->cq_pool);
 	rxe_pool_cleanup(&rxe->mr_pool);
 	rxe_pool_cleanup(&rxe->mw_pool);
-	rxe_pool_cleanup(&rxe->mc_grp_pool);
 
 	if (rxe->tfm)
 		crypto_free_shash(rxe->tfm);
@@ -157,15 +156,8 @@ static int rxe_init_pools(struct rxe_dev *rxe)
 	if (err)
 		goto err8;
 
-	err = rxe_pool_init(rxe, &rxe->mc_grp_pool, RXE_TYPE_MC_GRP,
-			    rxe->attr.max_mcast_grp);
-	if (err)
-		goto err9;
-
 	return 0;
 
-err9:
-	rxe_pool_cleanup(&rxe->mw_pool);
 err8:
 	rxe_pool_cleanup(&rxe->mr_pool);
 err7:
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index d9faf3a1ee61..409efeecd581 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -43,7 +43,7 @@ void rxe_cq_cleanup(struct rxe_pool_elem *arg);
 struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid);
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
-void rxe_mc_cleanup(struct rxe_pool_elem *arg);
+void rxe_cleanup_mcg(struct kref *kref);
 
 /* rxe_mmap.c */
 struct rxe_mmap_info {
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 4c3eb9c723b4..d01456052879 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -98,7 +98,7 @@ static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe,
 	}
 
 	if (node) {
-		rxe_add_ref(mcg);
+		kref_get(&mcg->ref_cnt);
 		return mcg;
 	}
 
@@ -139,7 +139,6 @@ static int rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 {
 	struct rxe_mcg *mcg, *tmp;
 	int ret;
-	struct rxe_pool *pool = &rxe->mc_grp_pool;
 
 	if (rxe->attr.max_mcast_grp == 0)
 		return -EINVAL;
@@ -152,7 +151,7 @@ static int rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 	}
 
 	/* speculative alloc of mcg without using GFP_ATOMIC */
-	mcg = rxe_alloc(pool);
+	mcg = kzalloc(sizeof(*mcg), GFP_KERNEL);
 	if (!mcg)
 		return -ENOMEM;
 
@@ -161,19 +160,22 @@ static int rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 	tmp = __rxe_lookup_mcg(rxe, mgid);
 	if (tmp) {
 		spin_unlock_bh(&rxe->mcg_lock);
-		rxe_drop_ref(mcg);
+		kfree(mcg);
 		mcg = tmp;
 		goto out;
 	}
 
-	if (atomic_inc_return(&rxe->mcg_num) > rxe->attr.max_mcast_grp)
+	if (atomic_inc_return(&rxe->mcg_num) > rxe->attr.max_mcast_grp) {
+		ret = -ENOMEM;
 		goto err_dec;
+	}
 
 	ret = rxe_mcast_add(rxe, mgid);
 	if (ret)
-		goto err_out;
+		goto err_dec;
 
-	rxe_add_ref(mcg);
+	kref_init(&mcg->ref_cnt);
+	kref_get(&mcg->ref_cnt);
 	mcg->rxe = rxe;
 	memcpy(&mcg->mgid, mgid, sizeof(*mgid));
 	INIT_LIST_HEAD(&mcg->qp_list);
@@ -186,13 +188,47 @@ static int rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 
 err_dec:
 	atomic_dec(&rxe->mcg_num);
-	ret = -ENOMEM;
-err_out:
 	spin_unlock_bh(&rxe->mcg_lock);
-	rxe_drop_ref(mcg);
+	kfree(mcg);
 	return ret;
 }
 
+/**
+ * __rxe_cleanup_mcg - cleanup mcg object holding lock
+ * @kref: kref embedded in mcg object
+ *
+ * Context: caller has put all references to mcg
+ * caller should hold rxe->mcg_lock
+ */
+static void __rxe_cleanup_mcg(struct kref *kref)
+{
+	struct rxe_mcg *mcg = container_of(kref, typeof(*mcg), ref_cnt);
+	struct rxe_dev *rxe = mcg->rxe;
+
+	__rxe_remove_mcg(mcg);
+	rxe_mcast_delete(rxe, &mcg->mgid);
+	atomic_dec(&rxe->mcg_num);
+
+	kfree(mcg);
+}
+
+/**
+ * rxe_cleanup_mcg - cleanup mcg object
+ * @kref: kref embedded in mcg object
+ *
+ * Context: caller has put all references to mcg and no one should be
+ * able to get another one
+ */
+void rxe_cleanup_mcg(struct kref *kref)
+{
+	struct rxe_mcg *mcg = container_of(kref, typeof(*mcg), ref_cnt);
+	struct rxe_dev *rxe = mcg->rxe;
+
+	spin_lock_bh(&rxe->mcg_lock);
+	__rxe_cleanup_mcg(kref);
+	spin_unlock_bh(&rxe->mcg_lock);
+}
+
 static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			   struct rxe_mcg *mcg)
 {
@@ -259,34 +295,22 @@ static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			list_del(&mca->qp_list);
 			n = atomic_dec_return(&mcg->qp_num);
 			if (n <= 0)
-				rxe_drop_ref(mcg);
+				kref_put(&mcg->ref_cnt, __rxe_cleanup_mcg);
 			atomic_dec(&qp->mcg_num);
 
 			spin_unlock_bh(&rxe->mcg_lock);
-			rxe_drop_ref(mcg);
+			kref_put(&mcg->ref_cnt, __rxe_cleanup_mcg);
 			kfree(mca);
 			return 0;
 		}
 	}
 
 	spin_unlock_bh(&rxe->mcg_lock);
-	rxe_drop_ref(mcg);
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 err1:
 	return -EINVAL;
 }
 
-void rxe_mc_cleanup(struct rxe_pool_elem *elem)
-{
-	struct rxe_mcg *mcg = container_of(elem, typeof(*mcg), elem);
-	struct rxe_dev *rxe = mcg->rxe;
-
-	spin_lock_bh(&rxe->mcg_lock);
-	__rxe_remove_mcg(mcg);
-	spin_unlock_bh(&rxe->mcg_lock);
-
-	rxe_mcast_delete(rxe, &mcg->mgid);
-}
-
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 {
 	int err;
@@ -301,7 +325,7 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 
 	err = rxe_mcast_add_grp_elem(rxe, qp, mcg);
 
-	rxe_drop_ref(mcg);
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 	return err;
 }
 
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 673b29f1f12c..b6fe7c93aaab 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -79,12 +79,6 @@ static const struct rxe_type_info {
 		.min_index	= RXE_MIN_MW_INDEX,
 		.max_index	= RXE_MAX_MW_INDEX,
 	},
-	[RXE_TYPE_MC_GRP] = {
-		.name		= "rxe-mc_grp",
-		.size		= sizeof(struct rxe_mcg),
-		.elem_offset	= offsetof(struct rxe_mcg, elem),
-		.cleanup	= rxe_mc_cleanup,
-	},
 };
 
 static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min)
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index b6de415e10d2..99b1eb04b405 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -21,7 +21,6 @@ enum rxe_elem_type {
 	RXE_TYPE_CQ,
 	RXE_TYPE_MR,
 	RXE_TYPE_MW,
-	RXE_TYPE_MC_GRP,
 	RXE_NUM_TYPES,		/* keep me last */
 };
 
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index f1ca83e09160..357a6cea1484 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -274,6 +274,8 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 			break;
 	}
 	spin_unlock_bh(&rxe->mcg_lock);
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
+
 	nmax = n;
 
 	/* this is unreliable datagram service so we let
@@ -320,8 +322,6 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 
 	kfree(qp_array);
 
-	rxe_drop_ref(mcg);
-
 	if (likely(!skb))
 		return;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index ea2d9ff29744..dea24ebdb3d0 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -352,8 +352,8 @@ struct rxe_mw {
 };
 
 struct rxe_mcg {
-	struct rxe_pool_elem	elem;
 	struct rb_node		node;
+	struct kref		ref_cnt;
 	struct rxe_dev		*rxe;
 	struct list_head	qp_list;
 	atomic_t		qp_num;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 15/26] RDMA/rxe: Add code to cleanup mcast memory
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (13 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 14/26] RDMA/rxe: Remove mcg from rxe pools Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 16/26] RDMA/rxe: Add comments to rxe_mcast.c Bob Pearson
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Well behaved applications will free all memory allocated by multicast
but programs which do not clean up properly can leave behind allocated
memory when the rxe driver is unloaded. This patch walks the red-black
tree holding multicast group elements and then walks the list of attached
qp's freeing the mca's and finally the mcg's.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |  2 ++
 drivers/infiniband/sw/rxe/rxe_loc.h   |  1 +
 drivers/infiniband/sw/rxe/rxe_mcast.c | 31 +++++++++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index c560d467a972..74c5521e9b3d 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -29,6 +29,8 @@ void rxe_dealloc(struct ib_device *ib_dev)
 	rxe_pool_cleanup(&rxe->mr_pool);
 	rxe_pool_cleanup(&rxe->mw_pool);
 
+	rxe_cleanup_mcast(rxe);
+
 	if (rxe->tfm)
 		crypto_free_shash(rxe->tfm);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 409efeecd581..0bc1b7e2877c 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -44,6 +44,7 @@ struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid);
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid);
 void rxe_cleanup_mcg(struct kref *kref);
+void rxe_cleanup_mcast(struct rxe_dev *rxe);
 
 /* rxe_mmap.c */
 struct rxe_mmap_info {
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index d01456052879..49cc1ad05bba 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -336,3 +336,34 @@ int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 
 	return rxe_mcast_drop_grp_elem(rxe, qp, mgid);
 }
+
+/**
+ * rxe_cleanup_mcast - cleanup all resources held by mcast
+ * @rxe: rxe object
+ *
+ * Called when rxe device is unloaded. Walk red-black tree to
+ * find all mcg's and then walk mcg->qp_list to find all mca's and
+ * free them. These should have been freed already if apps are
+ * well behaved.
+ */
+void rxe_cleanup_mcast(struct rxe_dev *rxe)
+{
+	struct rb_root *root = &rxe->mcg_tree;
+	struct rb_node *node, *next;
+	struct rxe_mcg *mcg;
+	struct rxe_mca *mca, *tmp;
+
+	for (node = rb_first(root); node; node = next) {
+		next = rb_next(node);
+		mcg = rb_entry(node, typeof(*mcg), node);
+
+		spin_lock_bh(&rxe->mcg_lock);
+		list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list)
+			kfree(mca);
+
+		__rxe_remove_mcg(mcg);
+		spin_unlock_bh(&rxe->mcg_lock);
+
+		kfree(mcg);
+	}
+}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 16/26] RDMA/rxe: Add comments to rxe_mcast.c
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (14 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 15/26] RDMA/rxe: Add code to cleanup mcast memory Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 17/26] RDMA/rxe: Separate code into subroutines Bob Pearson
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Add comments to rxe_mcast.c.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 42 ++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 49cc1ad05bba..77f166a5d5c8 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -1,12 +1,45 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 /*
+ * Copyright (c) 2022 Hewlett Packard Enterprise, Inc. All rights reserved.
  * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
  * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
  */
 
+/*
+ * rxe_mcast.c implements driver support for multicast transport.
+ * It is based on two data structures struct rxe_mcg ('mcg') and
+ * struct rxe_mca ('mca'). An mcg is allocated each time a qp is
+ * attached to a new mgid for the first time. These are indexed by
+ * a red-black tree using the mgid. This data structure is searched
+ * for the mcg when a multicast packet is received and when another
+ * qp is attached to the same mgid. It is cleaned up when the last qp
+ * is detached from the mcg. Each time a qp is attached to an mcg an
+ * mca is created. It holds a pointer to the qp and is added to a list
+ * of qp's that are attached to the mcg. The qp_list is used to replicate
+ * mcast packets in the rxe receive path.
+ *
+ * mcg's keep a count of the number of qp's attached and once the count
+ * goes to zero it needs to be cleaned up. mcg's also have a reference
+ * count. While InfiniBand multicast groups are created and destroyed
+ * by explicit MADs, for rxe devices this is more implicit and the mcg
+ * is created by the first qp attach and destroyed by the last qp detach.
+ * To implement this there is some hysteresis with an extra kref_get when
+ * the mcg is created and an extra kref_put when the qp count decreases
+ * to zero.
+ *
+ * The qp list and the red-black tree are protected by a single
+ * rxe->mcg_lock per device.
+ */
+
 #include "rxe.h"
-#include "rxe_loc.h"
 
+/**
+ * rxe_mcast_add - add multicast address to rxe device
+ * @rxe: rxe device object
+ * @mgid: multicast address as a gid
+ *
+ * Returns 0 on success else an error
+ */
 static int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
 {
 	unsigned char ll_addr[ETH_ALEN];
@@ -16,6 +49,13 @@ static int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
 	return dev_mc_add(rxe->ndev, ll_addr);
 }
 
+/**
+ * rxe_mcast_delete - delete multicast address from rxe device
+ * @rxe: rxe device object
+ * @mgid: multicast address as a gid
+ *
+ * Returns 0 on success else an error
+ */
 static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
 {
 	unsigned char ll_addr[ETH_ALEN];
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 17/26] RDMA/rxe: Separate code into subroutines
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (15 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 16/26] RDMA/rxe: Add comments to rxe_mcast.c Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 18/26] RDMA/rxe: Convert mca read locking to RCU Bob Pearson
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Cleanup rxe_mcast.c code by separating initialization and cleanup
of mca objects into subroutines. Added remaining documentation
comments.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 162 +++++++++++++++++++-------
 drivers/infiniband/sw/rxe/rxe_verbs.h |   1 +
 2 files changed, 121 insertions(+), 42 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 77f166a5d5c8..865e6e85084f 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -178,7 +178,7 @@ static int rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 		       struct rxe_mcg **mcgp)
 {
 	struct rxe_mcg *mcg, *tmp;
-	int ret;
+	int err;
 
 	if (rxe->attr.max_mcast_grp == 0)
 		return -EINVAL;
@@ -206,12 +206,12 @@ static int rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 	}
 
 	if (atomic_inc_return(&rxe->mcg_num) > rxe->attr.max_mcast_grp) {
-		ret = -ENOMEM;
+		err = -ENOMEM;
 		goto err_dec;
 	}
 
-	ret = rxe_mcast_add(rxe, mgid);
-	if (ret)
+	err = rxe_mcast_add(rxe, mgid);
+	if (err)
 		goto err_dec;
 
 	kref_init(&mcg->ref_cnt);
@@ -230,7 +230,7 @@ static int rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 	atomic_dec(&rxe->mcg_num);
 	spin_unlock_bh(&rxe->mcg_lock);
 	kfree(mcg);
-	return ret;
+	return err;
 }
 
 /**
@@ -269,11 +269,59 @@ void rxe_cleanup_mcg(struct kref *kref)
 	spin_unlock_bh(&rxe->mcg_lock);
 }
 
-static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-			   struct rxe_mcg *mcg)
+/**
+ * __rxe_init_mca - initialize a new mca holding lock
+ * @qp: qp object
+ * @mcg: mcg object
+ * @mca: empty space for new mca
+ *
+ * Context: caller must hold references on qp and mcg, rxe->mcg_lock
+ * and pass memory for new mca
+ *
+ * Returns: 0 on success else an error
+ */
+static int __rxe_init_mca(struct rxe_qp *qp, struct rxe_mcg *mcg,
+			  struct rxe_mca *mca)
 {
-	int err;
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	int n;
+
+	n = atomic_inc_return(&rxe->mcg_attach);
+	if (n > rxe->attr.max_total_mcast_qp_attach) {
+		atomic_dec(&rxe->mcg_attach);
+		return -ENOMEM;
+	}
+
+	n = atomic_inc_return(&mcg->qp_num);
+	if (n > rxe->attr.max_mcast_qp_attach) {
+		atomic_dec(&mcg->qp_num);
+		atomic_dec(&rxe->mcg_attach);
+		return -ENOMEM;
+	}
+
+	atomic_inc(&qp->mcg_num);
+
+	rxe_add_ref(qp);
+	mca->qp = qp;
+
+	list_add_tail(&mca->qp_list, &mcg->qp_list);
+
+	return 0;
+}
+
+/**
+ * rxe_attach_mcg - attach qp to mcg if not already attached
+ * @mcg: mcg object
+ * @qp: qp object
+ *
+ * Context: caller must hold reference on qp and mcg.
+ * Returns: 0 on success else an error
+ */
+static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
+{
+	struct rxe_dev *rxe = mcg->rxe;
 	struct rxe_mca *mca, *new_mca;
+	int err;
 
 	/* check to see if the qp is already a member of the group */
 	spin_lock_bh(&rxe->mcg_lock);
@@ -296,61 +344,74 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 		if (mca->qp == qp) {
 			kfree(new_mca);
 			err = 0;
-			goto out;
+			goto done;
 		}
 	}
 
-	if (atomic_read(&mcg->qp_num) >= rxe->attr.max_mcast_qp_attach) {
-		err = -ENOMEM;
-		goto out;
-	}
+	mca = new_mca;
+	err = __rxe_init_mca(qp, mcg, mca);
+	if (err)
+		kfree(mca);
+done:
+	spin_unlock_bh(&rxe->mcg_lock);
 
-	atomic_inc(&mcg->qp_num);
-	new_mca->qp = qp;
-	atomic_inc(&qp->mcg_num);
+	return err;
+}
+
+/**
+ * __rxe_cleanup_mca - cleanup mca object holding lock
+ * @mca: mca object
+ * @mcg: mcg object
+ *
+ * Context: caller must hold a reference to mcg and rxe->mcg_lock
+ */
+static void __rxe_cleanup_mca(struct rxe_mca *mca, struct rxe_mcg *mcg)
+{
+	list_del(&mca->qp_list);
 
-	list_add_tail(&new_mca->qp_list, &mcg->qp_list);
+	atomic_dec(&mcg->qp_num);
+	atomic_dec(&mcg->rxe->mcg_attach);
+	atomic_dec(&mca->qp->mcg_num);
 
-	err = 0;
-out:
-	spin_unlock_bh(&rxe->mcg_lock);
-	return err;
+	rxe_drop_ref(mca->qp);
 }
 
-static int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-				   union ib_gid *mgid)
+/**
+ * rxe_detach_mcg - detach qp from mcg
+ * @mcg: mcg object
+ * @qp: qp object
+ *
+ * Returns: 0 on success else an error if qp is not attached.
+ */
+static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 {
-	struct rxe_mcg *mcg;
+	struct rxe_dev *rxe = mcg->rxe;
 	struct rxe_mca *mca, *tmp;
-	int n;
-
-	mcg = rxe_lookup_mcg(rxe, mgid);
-	if (!mcg)
-		goto err1;
 
 	spin_lock_bh(&rxe->mcg_lock);
-
 	list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
-			list_del(&mca->qp_list);
-			n = atomic_dec_return(&mcg->qp_num);
-			if (n <= 0)
+			__rxe_cleanup_mca(mca, mcg);
+			if (atomic_read(&mcg->qp_num) <= 0)
 				kref_put(&mcg->ref_cnt, __rxe_cleanup_mcg);
-			atomic_dec(&qp->mcg_num);
-
 			spin_unlock_bh(&rxe->mcg_lock);
-			kref_put(&mcg->ref_cnt, __rxe_cleanup_mcg);
 			kfree(mca);
 			return 0;
 		}
 	}
-
 	spin_unlock_bh(&rxe->mcg_lock);
-	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
-err1:
+
 	return -EINVAL;
 }
 
+/**
+ * rxe_attach_mcast - attach qp to multicast group (see IBA-11.3.1)
+ * @ibqp: (IB) qp object
+ * @mgid: multicast IP address
+ * @mlid: multicast LID, ignored for RoCEv2 (see IBA-A17.5.6)
+ *
+ * Returns: 0 on success else an errno
+ */
 int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 {
 	int err;
@@ -363,18 +424,35 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 	if (err)
 		return err;
 
-	err = rxe_mcast_add_grp_elem(rxe, qp, mcg);
-
+	err = rxe_attach_mcg(mcg, qp);
 	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
+
 	return err;
 }
 
+/**
+ * rxe_detach_mcast - detach qp from multicast group (see IBA-11.3.2)
+ * @ibqp: address of (IB) qp object
+ * @mgid: multicast IP address
+ * @mlid: multicast LID, ignored for RoCEv2 (see IBA-A17.5.6)
+ *
+ * Returns: 0 on success else an errno
+ */
 int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 {
 	struct rxe_dev *rxe = to_rdev(ibqp->device);
 	struct rxe_qp *qp = to_rqp(ibqp);
+	struct rxe_mcg *mcg;
+	int err;
+
+	mcg = rxe_lookup_mcg(rxe, mgid);
+	if (!mcg)
+		return -EINVAL;
 
-	return rxe_mcast_drop_grp_elem(rxe, qp, mgid);
+	err = rxe_detach_mcg(mcg, qp);
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
+
+	return err;
 }
 
 /**
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index dea24ebdb3d0..76350d43ce2a 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -400,6 +400,7 @@ struct rxe_dev {
 	spinlock_t		mcg_lock; /* guard multicast groups */
 	struct rb_root		mcg_tree;
 	atomic_t		mcg_num;
+	atomic_t		mcg_attach;
 
 	spinlock_t		pending_lock; /* guard pending_mmaps */
 	struct list_head	pending_mmaps;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 18/26] RDMA/rxe: Convert mca read locking to RCU
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (16 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 17/26] RDMA/rxe: Separate code into subroutines Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-28 18:39   ` Jason Gunthorpe
  2022-01-27 21:37 ` [RFC PATCH v9 19/26] RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC Bob Pearson
                   ` (8 subsequent siblings)
  26 siblings, 1 reply; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Replace spinlocks with rcu read locks for read side operations
on mca.n rxe_recv.c and rxe_mcast.c. Use rcu list extensions on
write side operations and keep spinlocks to separate write threads.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mcast.c | 57 ++++++++++++++++-----------
 drivers/infiniband/sw/rxe/rxe_recv.c  |  6 +--
 drivers/infiniband/sw/rxe/rxe_verbs.h |  1 +
 3 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 865e6e85084f..c193bd4975f7 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -27,7 +27,8 @@
  * the mcg is created and an extra kref_put when the qp count decreases
  * to zero.
  *
- * The qp list and the red-black tree are protected by a single
+ * The qp list is protected for read operations by RCU and the qp list and
+ * the red-black tree are protected for write operations by a single
  * rxe->mcg_lock per device.
  */
 
@@ -270,7 +271,7 @@ void rxe_cleanup_mcg(struct kref *kref)
 }
 
 /**
- * __rxe_init_mca - initialize a new mca holding lock
+ * __rxe_init_mca_rcu - initialize a new mca holding lock
  * @qp: qp object
  * @mcg: mcg object
  * @mca: empty space for new mca
@@ -280,7 +281,7 @@ void rxe_cleanup_mcg(struct kref *kref)
  *
  * Returns: 0 on success else an error
  */
-static int __rxe_init_mca(struct rxe_qp *qp, struct rxe_mcg *mcg,
+static int __rxe_init_mca_rcu(struct rxe_qp *qp, struct rxe_mcg *mcg,
 			  struct rxe_mca *mca)
 {
 	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
@@ -304,7 +305,7 @@ static int __rxe_init_mca(struct rxe_qp *qp, struct rxe_mcg *mcg,
 	rxe_add_ref(qp);
 	mca->qp = qp;
 
-	list_add_tail(&mca->qp_list, &mcg->qp_list);
+	list_add_tail_rcu(&mca->qp_list, &mcg->qp_list);
 
 	return 0;
 }
@@ -324,14 +325,14 @@ static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 	int err;
 
 	/* check to see if the qp is already a member of the group */
-	spin_lock_bh(&rxe->mcg_lock);
-	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(mca, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
-			spin_unlock_bh(&rxe->mcg_lock);
+			rcu_read_unlock();
 			return 0;
 		}
 	}
-	spin_unlock_bh(&rxe->mcg_lock);
+	rcu_read_unlock();
 
 	/* speculative alloc new mca without using GFP_ATOMIC */
 	new_mca = kzalloc(sizeof(*mca), GFP_KERNEL);
@@ -340,16 +341,19 @@ static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 
 	spin_lock_bh(&rxe->mcg_lock);
 	/* re-check to see if someone else just attached qp */
-	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(mca, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
+			rcu_read_unlock();
 			kfree(new_mca);
 			err = 0;
 			goto done;
 		}
 	}
+	rcu_read_unlock();
 
 	mca = new_mca;
-	err = __rxe_init_mca(qp, mcg, mca);
+	err = __rxe_init_mca_rcu(qp, mcg, mca);
 	if (err)
 		kfree(mca);
 done:
@@ -359,21 +363,23 @@ static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 }
 
 /**
- * __rxe_cleanup_mca - cleanup mca object holding lock
+ * __rxe_cleanup_mca_rcu - cleanup mca object holding lock
  * @mca: mca object
  * @mcg: mcg object
  *
  * Context: caller must hold a reference to mcg and rxe->mcg_lock
  */
-static void __rxe_cleanup_mca(struct rxe_mca *mca, struct rxe_mcg *mcg)
+static void __rxe_cleanup_mca_rcu(struct rxe_mca *mca, struct rxe_mcg *mcg)
 {
-	list_del(&mca->qp_list);
+	list_del_rcu(&mca->qp_list);
 
 	atomic_dec(&mcg->qp_num);
 	atomic_dec(&mcg->rxe->mcg_attach);
 	atomic_dec(&mca->qp->mcg_num);
 
 	rxe_drop_ref(mca->qp);
+
+	kfree_rcu(mca, rcu);
 }
 
 /**
@@ -386,22 +392,29 @@ static void __rxe_cleanup_mca(struct rxe_mca *mca, struct rxe_mcg *mcg)
 static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
 {
 	struct rxe_dev *rxe = mcg->rxe;
-	struct rxe_mca *mca, *tmp;
+	struct rxe_mca *mca;
+	int ret;
 
 	spin_lock_bh(&rxe->mcg_lock);
-	list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(mca, &mcg->qp_list, qp_list) {
 		if (mca->qp == qp) {
-			__rxe_cleanup_mca(mca, mcg);
-			if (atomic_read(&mcg->qp_num) <= 0)
-				kref_put(&mcg->ref_cnt, __rxe_cleanup_mcg);
-			spin_unlock_bh(&rxe->mcg_lock);
-			kfree(mca);
-			return 0;
+			rcu_read_unlock();
+			goto found;
 		}
 	}
+	rcu_read_unlock();
+	ret = -EINVAL;
+	goto done;
+found:
+	__rxe_cleanup_mca_rcu(mca, mcg);
+	if (atomic_read(&mcg->qp_num) <= 0)
+		kref_put(&mcg->ref_cnt, __rxe_cleanup_mcg);
+	ret = 0;
+done:
 	spin_unlock_bh(&rxe->mcg_lock);
 
-	return -EINVAL;
+	return ret;
 }
 
 /**
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 357a6cea1484..7f2ea61a52c1 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -267,13 +267,13 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
 	qp_array = kmalloc_array(nmax, sizeof(qp), GFP_KERNEL);
 
 	n = 0;
-	spin_lock_bh(&rxe->mcg_lock);
-	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(mca, &mcg->qp_list, qp_list) {
 		qp_array[n++] = mca->qp;
 		if (n == nmax)
 			break;
 	}
-	spin_unlock_bh(&rxe->mcg_lock);
+	rcu_read_unlock();
 	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
 
 	nmax = n;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 76350d43ce2a..12bff190fc1f 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -365,6 +365,7 @@ struct rxe_mcg {
 struct rxe_mca {
 	struct list_head	qp_list;
 	struct rxe_qp		*qp;
+	struct rcu_head		rcu;
 };
 
 struct rxe_port {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 19/26] RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (17 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 18/26] RDMA/rxe: Convert mca read locking to RCU Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 20/26] RDMA/rxe: Delete _locked() APIs for pool objects Bob Pearson
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

There is only one remaining object type that allocates its own memory,
that is MR. So the sense of RXE_POOL_NO_ALLOC is changed to
RXE_POOL_ALLOC. Add checks to rxe_alloc() and rxe_add_to_pool() to
make sure the correct call is used for the setting of this flag.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 27 ++++++++++++++++++---------
 drivers/infiniband/sw/rxe/rxe_pool.h |  2 +-
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index b6fe7c93aaab..8fc3f0026f69 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -21,19 +21,17 @@ static const struct rxe_type_info {
 		.name		= "rxe-uc",
 		.size		= sizeof(struct rxe_ucontext),
 		.elem_offset	= offsetof(struct rxe_ucontext, elem),
-		.flags          = RXE_POOL_NO_ALLOC,
 	},
 	[RXE_TYPE_PD] = {
 		.name		= "rxe-pd",
 		.size		= sizeof(struct rxe_pd),
 		.elem_offset	= offsetof(struct rxe_pd, elem),
-		.flags		= RXE_POOL_NO_ALLOC,
 	},
 	[RXE_TYPE_AH] = {
 		.name		= "rxe-ah",
 		.size		= sizeof(struct rxe_ah),
 		.elem_offset	= offsetof(struct rxe_ah, elem),
-		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
+		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_AH_INDEX,
 		.max_index	= RXE_MAX_AH_INDEX,
 	},
@@ -41,7 +39,7 @@ static const struct rxe_type_info {
 		.name		= "rxe-srq",
 		.size		= sizeof(struct rxe_srq),
 		.elem_offset	= offsetof(struct rxe_srq, elem),
-		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
+		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_SRQ_INDEX,
 		.max_index	= RXE_MAX_SRQ_INDEX,
 	},
@@ -50,7 +48,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_qp),
 		.elem_offset	= offsetof(struct rxe_qp, elem),
 		.cleanup	= rxe_qp_cleanup,
-		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
+		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_QP_INDEX,
 		.max_index	= RXE_MAX_QP_INDEX,
 	},
@@ -58,7 +56,6 @@ static const struct rxe_type_info {
 		.name		= "rxe-cq",
 		.size		= sizeof(struct rxe_cq),
 		.elem_offset	= offsetof(struct rxe_cq, elem),
-		.flags          = RXE_POOL_NO_ALLOC,
 		.cleanup	= rxe_cq_cleanup,
 	},
 	[RXE_TYPE_MR] = {
@@ -66,7 +63,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_mr),
 		.elem_offset	= offsetof(struct rxe_mr, elem),
 		.cleanup	= rxe_mr_cleanup,
-		.flags		= RXE_POOL_INDEX,
+		.flags		= RXE_POOL_INDEX | RXE_POOL_ALLOC,
 		.min_index	= RXE_MIN_MR_INDEX,
 		.max_index	= RXE_MAX_MR_INDEX,
 	},
@@ -75,7 +72,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_mw),
 		.elem_offset	= offsetof(struct rxe_mw, elem),
 		.cleanup	= rxe_mw_cleanup,
-		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
+		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_MW_INDEX,
 		.max_index	= RXE_MAX_MW_INDEX,
 	},
@@ -262,6 +259,12 @@ void *rxe_alloc(struct rxe_pool *pool)
 	struct rxe_pool_elem *elem;
 	void *obj;
 
+	if (!(pool->flags & RXE_POOL_ALLOC)) {
+		pr_warn_once("%s: Pool %s must call rxe_add_to_pool\n",
+				__func__, pool->name);
+		return NULL;
+	}
+
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
 		goto out_cnt;
 
@@ -284,6 +287,12 @@ void *rxe_alloc(struct rxe_pool *pool)
 
 int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 {
+	if (pool->flags & RXE_POOL_ALLOC) {
+		pr_warn_once("%s: Pool %s must call rxe_alloc\n",
+				__func__, pool->name);
+		return -EINVAL;
+	}
+
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
 		goto out_cnt;
 
@@ -308,7 +317,7 @@ void rxe_elem_release(struct kref *kref)
 	if (pool->cleanup)
 		pool->cleanup(elem);
 
-	if (!(pool->flags & RXE_POOL_NO_ALLOC)) {
+	if (pool->flags & RXE_POOL_ALLOC) {
 		obj = elem->obj;
 		kfree(obj);
 	}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 99b1eb04b405..ca7e5c4c44cf 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -9,7 +9,7 @@
 
 enum rxe_pool_flags {
 	RXE_POOL_INDEX		= BIT(1),
-	RXE_POOL_NO_ALLOC	= BIT(4),
+	RXE_POOL_ALLOC		= BIT(2),
 };
 
 enum rxe_elem_type {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 20/26] RDMA/rxe: Delete _locked() APIs for pool objects
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (18 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 19/26] RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 21/26] RDMA/rxe: Replace obj by elem in declaration Bob Pearson
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Since caller managed locks for indexed objects are no longer used
these APIs are deleted.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 63 +++-------------------------
 drivers/infiniband/sw/rxe/rxe_pool.h | 24 ++---------
 2 files changed, 10 insertions(+), 77 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 8fc3f0026f69..b3c74988b0e9 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -189,71 +189,29 @@ static int rxe_insert_index(struct rxe_pool *pool, struct rxe_pool_elem *new)
 	return 0;
 }
 
-int __rxe_add_index_locked(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-	int err;
-
-	elem->index = alloc_index(pool);
-	err = rxe_insert_index(pool, elem);
-
-	return err;
-}
-
 int __rxe_add_index(struct rxe_pool_elem *elem)
 {
 	struct rxe_pool *pool = elem->pool;
 	int err;
 
 	write_lock_bh(&pool->pool_lock);
-	err = __rxe_add_index_locked(elem);
+	elem->index = alloc_index(pool);
+	err = rxe_insert_index(pool, elem);
 	write_unlock_bh(&pool->pool_lock);
 
 	return err;
 }
 
-void __rxe_drop_index_locked(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-
-	clear_bit(elem->index - pool->index.min_index, pool->index.table);
-	rb_erase(&elem->index_node, &pool->index.tree);
-}
-
 void __rxe_drop_index(struct rxe_pool_elem *elem)
 {
 	struct rxe_pool *pool = elem->pool;
 
 	write_lock_bh(&pool->pool_lock);
-	__rxe_drop_index_locked(elem);
+	clear_bit(elem->index - pool->index.min_index, pool->index.table);
+	rb_erase(&elem->index_node, &pool->index.tree);
 	write_unlock_bh(&pool->pool_lock);
 }
 
-void *rxe_alloc_locked(struct rxe_pool *pool)
-{
-	struct rxe_pool_elem *elem;
-	void *obj;
-
-	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
-		goto out_cnt;
-
-	obj = kzalloc(pool->elem_size, GFP_ATOMIC);
-	if (!obj)
-		goto out_cnt;
-
-	elem = (struct rxe_pool_elem *)((u8 *)obj + pool->elem_offset);
-
-	elem->pool = pool;
-	elem->obj = obj;
-	kref_init(&elem->ref_cnt);
-
-	return obj;
-
-out_cnt:
-	atomic_dec(&pool->num_elem);
-	return NULL;
-}
-
 void *rxe_alloc(struct rxe_pool *pool)
 {
 	struct rxe_pool_elem *elem;
@@ -325,12 +283,13 @@ void rxe_elem_release(struct kref *kref)
 	atomic_dec(&pool->num_elem);
 }
 
-void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index)
+void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 {
 	struct rb_node *node;
 	struct rxe_pool_elem *elem;
 	void *obj;
 
+	read_lock_bh(&pool->pool_lock);
 	node = pool->index.tree.rb_node;
 
 	while (node) {
@@ -350,16 +309,6 @@ void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index)
 	} else {
 		obj = NULL;
 	}
-
-	return obj;
-}
-
-void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
-{
-	void *obj;
-
-	read_lock_bh(&pool->pool_lock);
-	obj = rxe_pool_get_index_locked(pool, index);
 	read_unlock_bh(&pool->pool_lock);
 
 	return obj;
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index ca7e5c4c44cf..b7babf4789c7 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -68,9 +68,7 @@ int rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
 /* free resources from object pool */
 void rxe_pool_cleanup(struct rxe_pool *pool);
 
-/* allocate an object from pool holding and not holding the pool lock */
-void *rxe_alloc_locked(struct rxe_pool *pool);
-
+/* allocate an object from pool */
 void *rxe_alloc(struct rxe_pool *pool);
 
 /* connect already allocated object to pool */
@@ -79,32 +77,18 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem);
 #define rxe_add_to_pool(pool, obj) __rxe_add_to_pool(pool, &(obj)->elem)
 
 /* assign an index to an indexed object and insert object into
- *  pool's rb tree holding and not holding the pool_lock
+ * pool's rb tree
  */
-int __rxe_add_index_locked(struct rxe_pool_elem *elem);
-
-#define rxe_add_index_locked(obj) __rxe_add_index_locked(&(obj)->elem)
-
 int __rxe_add_index(struct rxe_pool_elem *elem);
 
 #define rxe_add_index(obj) __rxe_add_index(&(obj)->elem)
 
-/* drop an index and remove object from rb tree
- * holding and not holding the pool_lock
- */
-void __rxe_drop_index_locked(struct rxe_pool_elem *elem);
-
-#define rxe_drop_index_locked(obj) __rxe_drop_index_locked(&(obj)->elem)
-
+/* drop an index and remove object from rb tree */
 void __rxe_drop_index(struct rxe_pool_elem *elem);
 
 #define rxe_drop_index(obj) __rxe_drop_index(&(obj)->elem)
 
-/* lookup an indexed object from index holding and not holding the pool_lock.
- * takes a reference on object
- */
-void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index);
-
+/* lookup an indexed object from index. takes a reference on object */
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index);
 
 /* cleanup an object when all references are dropped */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 21/26] RDMA/rxe: Replace obj by elem in declaration
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (19 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 20/26] RDMA/rxe: Delete _locked() APIs for pool objects Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 22/26] RDMA/rxe: Replace red-black trees by xarrays Bob Pearson
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Fix a harmless typo replacing obj by elem in the cleanup fields.
This has no effect but is confusing.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 2 +-
 drivers/infiniband/sw/rxe/rxe_pool.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index b3c74988b0e9..a024c3bf8696 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -12,7 +12,7 @@ static const struct rxe_type_info {
 	const char *name;
 	size_t size;
 	size_t elem_offset;
-	void (*cleanup)(struct rxe_pool_elem *obj);
+	void (*cleanup)(struct rxe_pool_elem *elem);
 	enum rxe_pool_flags flags;
 	u32 min_index;
 	u32 max_index;
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index b7babf4789c7..3d3470d0e3c8 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -39,7 +39,7 @@ struct rxe_pool {
 	struct rxe_dev		*rxe;
 	const char		*name;
 	rwlock_t		pool_lock; /* protects pool add/del/search */
-	void			(*cleanup)(struct rxe_pool_elem *obj);
+	void			(*cleanup)(struct rxe_pool_elem *elem);
 	enum rxe_pool_flags	flags;
 	enum rxe_elem_type	type;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 22/26] RDMA/rxe: Replace red-black trees by xarrays
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (20 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 21/26] RDMA/rxe: Replace obj by elem in declaration Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 23/26] RDMA/rxe: Change pool locking to RCU Bob Pearson
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently the rxe driver uses red-black trees to add indices
to the rxe object pool. Linux xarrays provide a better way to implement
the same functionality for indices. This patch replaces
red-black trees by xarrays for pool objects.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |  86 ++--------
 drivers/infiniband/sw/rxe/rxe_mr.c    |   1 -
 drivers/infiniband/sw/rxe/rxe_mw.c    |   8 -
 drivers/infiniband/sw/rxe/rxe_pool.c  | 218 +++++++++-----------------
 drivers/infiniband/sw/rxe/rxe_pool.h  |  40 ++---
 drivers/infiniband/sw/rxe/rxe_verbs.c |  12 --
 6 files changed, 98 insertions(+), 267 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 74c5521e9b3d..de94947df18f 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -114,83 +114,27 @@ static void rxe_init_ports(struct rxe_dev *rxe)
 }
 
 /* init pools of managed objects */
-static int rxe_init_pools(struct rxe_dev *rxe)
+static void rxe_init_pools(struct rxe_dev *rxe)
 {
-	int err;
-
-	err = rxe_pool_init(rxe, &rxe->uc_pool, RXE_TYPE_UC,
-			    rxe->max_ucontext);
-	if (err)
-		goto err1;
-
-	err = rxe_pool_init(rxe, &rxe->pd_pool, RXE_TYPE_PD,
-			    rxe->attr.max_pd);
-	if (err)
-		goto err2;
-
-	err = rxe_pool_init(rxe, &rxe->ah_pool, RXE_TYPE_AH,
-			    rxe->attr.max_ah);
-	if (err)
-		goto err3;
-
-	err = rxe_pool_init(rxe, &rxe->srq_pool, RXE_TYPE_SRQ,
-			    rxe->attr.max_srq);
-	if (err)
-		goto err4;
-
-	err = rxe_pool_init(rxe, &rxe->qp_pool, RXE_TYPE_QP,
-			    rxe->attr.max_qp);
-	if (err)
-		goto err5;
-
-	err = rxe_pool_init(rxe, &rxe->cq_pool, RXE_TYPE_CQ,
-			    rxe->attr.max_cq);
-	if (err)
-		goto err6;
-
-	err = rxe_pool_init(rxe, &rxe->mr_pool, RXE_TYPE_MR,
-			    rxe->attr.max_mr);
-	if (err)
-		goto err7;
-
-	err = rxe_pool_init(rxe, &rxe->mw_pool, RXE_TYPE_MW,
-			    rxe->attr.max_mw);
-	if (err)
-		goto err8;
-
-	return 0;
-
-err8:
-	rxe_pool_cleanup(&rxe->mr_pool);
-err7:
-	rxe_pool_cleanup(&rxe->cq_pool);
-err6:
-	rxe_pool_cleanup(&rxe->qp_pool);
-err5:
-	rxe_pool_cleanup(&rxe->srq_pool);
-err4:
-	rxe_pool_cleanup(&rxe->ah_pool);
-err3:
-	rxe_pool_cleanup(&rxe->pd_pool);
-err2:
-	rxe_pool_cleanup(&rxe->uc_pool);
-err1:
-	return err;
+	rxe_pool_init(rxe, &rxe->uc_pool, RXE_TYPE_UC, rxe->max_ucontext);
+	rxe_pool_init(rxe, &rxe->pd_pool, RXE_TYPE_PD, rxe->attr.max_pd);
+	rxe_pool_init(rxe, &rxe->ah_pool, RXE_TYPE_AH, rxe->attr.max_ah);
+	rxe_pool_init(rxe, &rxe->srq_pool, RXE_TYPE_SRQ, rxe->attr.max_srq);
+	rxe_pool_init(rxe, &rxe->qp_pool, RXE_TYPE_QP, rxe->attr.max_qp);
+	rxe_pool_init(rxe, &rxe->cq_pool, RXE_TYPE_CQ, rxe->attr.max_cq);
+	rxe_pool_init(rxe, &rxe->mr_pool, RXE_TYPE_MR, rxe->attr.max_mr);
+	rxe_pool_init(rxe, &rxe->mw_pool, RXE_TYPE_MW, rxe->attr.max_mw);
 }
 
 /* initialize rxe device state */
-static int rxe_init(struct rxe_dev *rxe)
+static void rxe_init(struct rxe_dev *rxe)
 {
-	int err;
-
 	/* init default device parameters */
 	rxe_init_device_param(rxe);
 
 	rxe_init_ports(rxe);
 
-	err = rxe_init_pools(rxe);
-	if (err)
-		return err;
+	rxe_init_pools(rxe);
 
 	spin_lock_init(&rxe->mcg_lock);
 	rxe->mcg_tree = RB_ROOT;
@@ -201,8 +145,6 @@ static int rxe_init(struct rxe_dev *rxe)
 	INIT_LIST_HEAD(&rxe->pending_mmaps);
 
 	mutex_init(&rxe->usdev_lock);
-
-	return 0;
 }
 
 void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu)
@@ -224,11 +166,7 @@ void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu)
  */
 int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name)
 {
-	int err;
-
-	err = rxe_init(rxe);
-	if (err)
-		return err;
+	rxe_init(rxe);
 
 	rxe_set_mtu(rxe, mtu);
 
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 453ef3c9d535..35628b8a00b4 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -691,7 +691,6 @@ int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 
 	mr->state = RXE_MR_STATE_INVALID;
 	rxe_drop_ref(mr_pd(mr));
-	rxe_drop_index(mr);
 	rxe_drop_ref(mr);
 
 	return 0;
diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c
index 32dd8c0b8b9e..7df36c40eec2 100644
--- a/drivers/infiniband/sw/rxe/rxe_mw.c
+++ b/drivers/infiniband/sw/rxe/rxe_mw.c
@@ -20,7 +20,6 @@ int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata)
 		return ret;
 	}
 
-	rxe_add_index(mw);
 	mw->rkey = ibmw->rkey = (mw->elem.index << 8) | rxe_get_next_key(-1);
 	mw->state = (mw->ibmw.type == IB_MW_TYPE_2) ?
 			RXE_MW_STATE_FREE : RXE_MW_STATE_VALID;
@@ -329,10 +328,3 @@ struct rxe_mw *rxe_lookup_mw(struct rxe_qp *qp, int access, u32 rkey)
 
 	return mw;
 }
-
-void rxe_mw_cleanup(struct rxe_pool_elem *elem)
-{
-	struct rxe_mw *mw = container_of(elem, typeof(*mw), elem);
-
-	rxe_drop_index(mw);
-}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index a024c3bf8696..928bc56b439f 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -21,11 +21,15 @@ static const struct rxe_type_info {
 		.name		= "rxe-uc",
 		.size		= sizeof(struct rxe_ucontext),
 		.elem_offset	= offsetof(struct rxe_ucontext, elem),
+		.min_index	= 1,
+		.max_index	= UINT_MAX,
 	},
 	[RXE_TYPE_PD] = {
 		.name		= "rxe-pd",
 		.size		= sizeof(struct rxe_pd),
 		.elem_offset	= offsetof(struct rxe_pd, elem),
+		.min_index	= 1,
+		.max_index	= UINT_MAX,
 	},
 	[RXE_TYPE_AH] = {
 		.name		= "rxe-ah",
@@ -57,6 +61,8 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_cq),
 		.elem_offset	= offsetof(struct rxe_cq, elem),
 		.cleanup	= rxe_cq_cleanup,
+		.min_index	= 1,
+		.max_index	= UINT_MAX,
 	},
 	[RXE_TYPE_MR] = {
 		.name		= "rxe-mr",
@@ -71,44 +77,16 @@ static const struct rxe_type_info {
 		.name		= "rxe-mw",
 		.size		= sizeof(struct rxe_mw),
 		.elem_offset	= offsetof(struct rxe_mw, elem),
-		.cleanup	= rxe_mw_cleanup,
 		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_MW_INDEX,
 		.max_index	= RXE_MAX_MW_INDEX,
 	},
 };
 
-static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min)
-{
-	int err = 0;
-
-	if ((max - min + 1) < pool->max_elem) {
-		pr_warn("not enough indices for max_elem\n");
-		err = -EINVAL;
-		goto out;
-	}
-
-	pool->index.max_index = max;
-	pool->index.min_index = min;
-
-	pool->index.table = bitmap_zalloc(max - min + 1, GFP_KERNEL);
-	if (!pool->index.table) {
-		err = -ENOMEM;
-		goto out;
-	}
-
-out:
-	return err;
-}
-
-int rxe_pool_init(
-	struct rxe_dev		*rxe,
-	struct rxe_pool		*pool,
-	enum rxe_elem_type	type,
-	unsigned int		max_elem)
+void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
+		   enum rxe_elem_type type, unsigned int max_elem)
 {
 	const struct rxe_type_info *info = &rxe_type_info[type];
-	int			err = 0;
 
 	memset(pool, 0, sizeof(*pool));
 
@@ -125,110 +103,54 @@ int rxe_pool_init(
 
 	rwlock_init(&pool->pool_lock);
 
-	if (pool->flags & RXE_POOL_INDEX) {
-		pool->index.tree = RB_ROOT;
-		err = rxe_pool_init_index(pool, info->max_index,
-					  info->min_index);
-		if (err)
-			goto out;
-	}
-
-out:
-	return err;
+	xa_init_flags(&pool->xa, XA_FLAGS_ALLOC);
+	pool->limit.max = info->max_index;
+	pool->limit.min = info->min_index;
 }
 
 void rxe_pool_cleanup(struct rxe_pool *pool)
 {
-	if (atomic_read(&pool->num_elem) > 0)
-		pr_warn("%s pool destroyed with unfree'd elem\n",
-			pool->name);
-
-	if (pool->flags & RXE_POOL_INDEX)
-		bitmap_free(pool->index.table);
-}
-
-static u32 alloc_index(struct rxe_pool *pool)
-{
-	u32 index;
-	u32 range = pool->index.max_index - pool->index.min_index + 1;
-
-	index = find_next_zero_bit(pool->index.table, range, pool->index.last);
-	if (index >= range)
-		index = find_first_zero_bit(pool->index.table, range);
-
-	WARN_ON_ONCE(index >= range);
-	set_bit(index, pool->index.table);
-	pool->index.last = index;
-	return index + pool->index.min_index;
-}
-
-static int rxe_insert_index(struct rxe_pool *pool, struct rxe_pool_elem *new)
-{
-	struct rb_node **link = &pool->index.tree.rb_node;
-	struct rb_node *parent = NULL;
 	struct rxe_pool_elem *elem;
-
-	while (*link) {
-		parent = *link;
-		elem = rb_entry(parent, struct rxe_pool_elem, index_node);
-
-		if (elem->index == new->index) {
-			pr_warn("element already exists!\n");
-			return -EINVAL;
+	unsigned long index = 0;
+	unsigned long max = ULONG_MAX;
+	unsigned int elem_count = 0;
+	unsigned int free_count = 0;
+
+	do {
+		elem = xa_find(&pool->xa, &index, max, XA_PRESENT);
+		if (elem) {
+			elem_count++;
+			xa_erase(&pool->xa, index);
+			if (pool->flags & RXE_POOL_ALLOC) {
+				kfree(elem->obj);
+				free_count++;
+			}
 		}
+	} while (elem);
 
-		if (elem->index > new->index)
-			link = &(*link)->rb_left;
-		else
-			link = &(*link)->rb_right;
-	}
-
-	rb_link_node(&new->index_node, parent, link);
-	rb_insert_color(&new->index_node, &pool->index.tree);
-
-	return 0;
-}
-
-int __rxe_add_index(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-	int err;
-
-	write_lock_bh(&pool->pool_lock);
-	elem->index = alloc_index(pool);
-	err = rxe_insert_index(pool, elem);
-	write_unlock_bh(&pool->pool_lock);
-
-	return err;
-}
-
-void __rxe_drop_index(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-
-	write_lock_bh(&pool->pool_lock);
-	clear_bit(elem->index - pool->index.min_index, pool->index.table);
-	rb_erase(&elem->index_node, &pool->index.tree);
-	write_unlock_bh(&pool->pool_lock);
+	if (elem_count || free_count)
+		pr_warn("Freed %d indices and %d objects from pool %s\n",
+				elem_count, free_count, pool->name);
 }
 
 void *rxe_alloc(struct rxe_pool *pool)
 {
 	struct rxe_pool_elem *elem;
 	void *obj;
+	int err;
 
 	if (!(pool->flags & RXE_POOL_ALLOC)) {
-		pr_warn_once("%s: Pool %s must call rxe_add_to_pool\n",
+		pr_warn_once("%s: pool %s must call rxe_add_to_pool\n",
 				__func__, pool->name);
 		return NULL;
 	}
 
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
-		goto out_cnt;
+		goto err_cnt;
 
 	obj = kzalloc(pool->elem_size, GFP_KERNEL);
 	if (!obj)
-		goto out_cnt;
+		goto err_cnt;
 
 	elem = (struct rxe_pool_elem *)((u8 *)obj + pool->elem_offset);
 
@@ -236,36 +158,66 @@ void *rxe_alloc(struct rxe_pool *pool)
 	elem->obj = obj;
 	kref_init(&elem->ref_cnt);
 
+	err = xa_alloc_cyclic_bh(&pool->xa, &elem->index, elem, pool->limit,
+			&pool->next, GFP_KERNEL);
+	if (err)
+		goto err_free;
+
 	return obj;
 
-out_cnt:
+err_free:
+	kfree(obj);
+err_cnt:
 	atomic_dec(&pool->num_elem);
 	return NULL;
 }
 
 int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 {
+	int err;
+
 	if (pool->flags & RXE_POOL_ALLOC) {
-		pr_warn_once("%s: Pool %s must call rxe_alloc\n",
+		pr_warn_once("%s: pool %s must call rxe_alloc\n",
 				__func__, pool->name);
 		return -EINVAL;
 	}
 
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
-		goto out_cnt;
+		goto err_cnt;
 
 	elem->pool = pool;
 	elem->obj = (u8 *)elem - pool->elem_offset;
 	kref_init(&elem->ref_cnt);
 
+	err = xa_alloc_cyclic_bh(&pool->xa, &elem->index, elem, pool->limit,
+			&pool->next, GFP_KERNEL);
+	if (err)
+		goto err_cnt;
+
 	return 0;
 
-out_cnt:
+err_cnt:
 	atomic_dec(&pool->num_elem);
 	return -EINVAL;
 }
 
-void rxe_elem_release(struct kref *kref)
+void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
+{
+	struct rxe_pool_elem *elem;
+	void *obj;
+
+	read_lock_bh(&pool->pool_lock);
+	elem = xa_load(&pool->xa, index);
+	if (elem && kref_get_unless_zero(&elem->ref_cnt))
+		obj = elem->obj;
+	else
+		obj = NULL;
+	read_unlock_bh(&pool->pool_lock);
+
+	return obj;
+}
+
+static void rxe_elem_release(struct kref *kref)
 {
 	struct rxe_pool_elem *elem =
 		container_of(kref, struct rxe_pool_elem, ref_cnt);
@@ -280,36 +232,16 @@ void rxe_elem_release(struct kref *kref)
 		kfree(obj);
 	}
 
+	xa_erase(&pool->xa, elem->index);
 	atomic_dec(&pool->num_elem);
 }
 
-void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
+int __rxe_add_ref(struct rxe_pool_elem *elem)
 {
-	struct rb_node *node;
-	struct rxe_pool_elem *elem;
-	void *obj;
-
-	read_lock_bh(&pool->pool_lock);
-	node = pool->index.tree.rb_node;
-
-	while (node) {
-		elem = rb_entry(node, struct rxe_pool_elem, index_node);
-
-		if (elem->index > index)
-			node = node->rb_left;
-		else if (elem->index < index)
-			node = node->rb_right;
-		else
-			break;
-	}
-
-	if (node) {
-		kref_get(&elem->ref_cnt);
-		obj = elem->obj;
-	} else {
-		obj = NULL;
-	}
-	read_unlock_bh(&pool->pool_lock);
+	return kref_get_unless_zero(&elem->ref_cnt);
+}
 
-	return obj;
+int __rxe_drop_ref(struct rxe_pool_elem *elem)
+{
+	return kref_put(&elem->ref_cnt, rxe_elem_release);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 3d3470d0e3c8..c985ed519066 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -29,9 +29,6 @@ struct rxe_pool_elem {
 	void			*obj;
 	struct kref		ref_cnt;
 	struct list_head	list;
-
-	/* only used if indexed */
-	struct rb_node		index_node;
 	u32			index;
 };
 
@@ -48,21 +45,17 @@ struct rxe_pool {
 	size_t			elem_size;
 	size_t			elem_offset;
 
-	/* only used if indexed */
-	struct {
-		struct rb_root		tree;
-		unsigned long		*table;
-		u32			last;
-		u32			max_index;
-		u32			min_index;
-	} index;
+	struct xarray		xa;
+	struct xa_limit		limit;
+	u32			next;
+	int			locked;	/* ?? */
 };
 
 /* initialize a pool of objects with given limit on
  * number of elements. gets parameters from rxe_type_info
  * pool elements will be allocated out of a slab cache
  */
-int rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
+void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
 		  enum rxe_elem_type type, u32 max_elem);
 
 /* free resources from object pool */
@@ -76,28 +69,17 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem);
 
 #define rxe_add_to_pool(pool, obj) __rxe_add_to_pool(pool, &(obj)->elem)
 
-/* assign an index to an indexed object and insert object into
- * pool's rb tree
- */
-int __rxe_add_index(struct rxe_pool_elem *elem);
-
-#define rxe_add_index(obj) __rxe_add_index(&(obj)->elem)
-
-/* drop an index and remove object from rb tree */
-void __rxe_drop_index(struct rxe_pool_elem *elem);
-
-#define rxe_drop_index(obj) __rxe_drop_index(&(obj)->elem)
-
 /* lookup an indexed object from index. takes a reference on object */
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index);
 
-/* cleanup an object when all references are dropped */
-void rxe_elem_release(struct kref *kref);
-
 /* take a reference on an object */
-#define rxe_add_ref(obj) kref_get(&(obj)->elem.ref_cnt)
+int __rxe_add_ref(struct rxe_pool_elem *elem);
+
+#define rxe_add_ref(obj) __rxe_add_ref(&(obj)->elem)
 
 /* drop a reference on an object */
-#define rxe_drop_ref(obj) kref_put(&(obj)->elem.ref_cnt, rxe_elem_release)
+int __rxe_drop_ref(struct rxe_pool_elem *elem);
+
+#define rxe_drop_ref(obj) __rxe_drop_ref(&(obj)->elem)
 
 #endif /* RXE_POOL_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 9f0aef4b649d..3ca374f1cf9b 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -181,7 +181,6 @@ static int rxe_create_ah(struct ib_ah *ibah,
 		return err;
 
 	/* create index > 0 */
-	rxe_add_index(ah);
 	ah->ah_num = ah->elem.index;
 
 	if (uresp) {
@@ -189,7 +188,6 @@ static int rxe_create_ah(struct ib_ah *ibah,
 		err = copy_to_user(&uresp->ah_num, &ah->ah_num,
 					 sizeof(uresp->ah_num));
 		if (err) {
-			rxe_drop_index(ah);
 			rxe_drop_ref(ah);
 			return -EFAULT;
 		}
@@ -230,7 +228,6 @@ static int rxe_destroy_ah(struct ib_ah *ibah, u32 flags)
 {
 	struct rxe_ah *ah = to_rah(ibah);
 
-	rxe_drop_index(ah);
 	rxe_drop_ref(ah);
 	return 0;
 }
@@ -437,7 +434,6 @@ static int rxe_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init,
 	if (err)
 		return err;
 
-	rxe_add_index(qp);
 	err = rxe_qp_from_init(rxe, qp, pd, init, uresp, ibqp->pd, udata);
 	if (err)
 		goto qp_init;
@@ -445,7 +441,6 @@ static int rxe_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init,
 	return 0;
 
 qp_init:
-	rxe_drop_index(qp);
 	rxe_drop_ref(qp);
 	return err;
 }
@@ -500,7 +495,6 @@ static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 		return ret;
 
 	rxe_qp_destroy(qp);
-	rxe_drop_index(qp);
 	rxe_drop_ref(qp);
 	return 0;
 }
@@ -903,7 +897,6 @@ static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access)
 	if (!mr)
 		return ERR_PTR(-ENOMEM);
 
-	rxe_add_index(mr);
 	rxe_add_ref(pd);
 	rxe_mr_init_dma(pd, access, mr);
 
@@ -927,7 +920,6 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 		goto err2;
 	}
 
-	rxe_add_index(mr);
 
 	rxe_add_ref(pd);
 
@@ -939,7 +931,6 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 
 err3:
 	rxe_drop_ref(pd);
-	rxe_drop_index(mr);
 	rxe_drop_ref(mr);
 err2:
 	return ERR_PTR(err);
@@ -962,8 +953,6 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
 		goto err1;
 	}
 
-	rxe_add_index(mr);
-
 	rxe_add_ref(pd);
 
 	err = rxe_mr_init_fast(pd, max_num_sg, mr);
@@ -974,7 +963,6 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
 
 err2:
 	rxe_drop_ref(pd);
-	rxe_drop_index(mr);
 	rxe_drop_ref(mr);
 err1:
 	return ERR_PTR(err);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 23/26] RDMA/rxe: Change pool locking to RCU
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (21 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 22/26] RDMA/rxe: Replace red-black trees by xarrays Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 24/26] RDMA/rxe: Add wait_for_completion to pool objects Bob Pearson
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently the rxe driver uses red-black trees to add indices
to the rxe object pool. Linux xarrays provide a better way to implement
the same functionality for indices. This patch replaces
red-black trees by xarrays for indexed objects.

Read side operations are protected by rcu_read_lock and write side
operations which are all from verbs API calls are protected by
the xa_lock spinlock.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c  | 50 +++++++++++++++------------
 drivers/infiniband/sw/rxe/rxe_pool.h  | 19 ++--------
 drivers/infiniband/sw/rxe/rxe_verbs.h |  1 +
 3 files changed, 30 insertions(+), 40 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 928bc56b439f..18cdf5e0ad4e 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 /*
+ * Copyright (c) 2022 Hewlett Packard Enterprise, Inc. All rights reserved.
  * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
  * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
  */
@@ -35,7 +36,6 @@ static const struct rxe_type_info {
 		.name		= "rxe-ah",
 		.size		= sizeof(struct rxe_ah),
 		.elem_offset	= offsetof(struct rxe_ah, elem),
-		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_AH_INDEX,
 		.max_index	= RXE_MAX_AH_INDEX,
 	},
@@ -43,7 +43,6 @@ static const struct rxe_type_info {
 		.name		= "rxe-srq",
 		.size		= sizeof(struct rxe_srq),
 		.elem_offset	= offsetof(struct rxe_srq, elem),
-		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_SRQ_INDEX,
 		.max_index	= RXE_MAX_SRQ_INDEX,
 	},
@@ -52,7 +51,6 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_qp),
 		.elem_offset	= offsetof(struct rxe_qp, elem),
 		.cleanup	= rxe_qp_cleanup,
-		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_QP_INDEX,
 		.max_index	= RXE_MAX_QP_INDEX,
 	},
@@ -69,7 +67,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_mr),
 		.elem_offset	= offsetof(struct rxe_mr, elem),
 		.cleanup	= rxe_mr_cleanup,
-		.flags		= RXE_POOL_INDEX | RXE_POOL_ALLOC,
+		.flags		= RXE_POOL_ALLOC,
 		.min_index	= RXE_MIN_MR_INDEX,
 		.max_index	= RXE_MAX_MR_INDEX,
 	},
@@ -77,7 +75,6 @@ static const struct rxe_type_info {
 		.name		= "rxe-mw",
 		.size		= sizeof(struct rxe_mw),
 		.elem_offset	= offsetof(struct rxe_mw, elem),
-		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_MW_INDEX,
 		.max_index	= RXE_MAX_MW_INDEX,
 	},
@@ -100,14 +97,14 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
 	pool->cleanup		= info->cleanup;
 
 	atomic_set(&pool->num_elem, 0);
-
-	rwlock_init(&pool->pool_lock);
+	spin_lock_init(&pool->xa.xa_lock);
 
 	xa_init_flags(&pool->xa, XA_FLAGS_ALLOC);
 	pool->limit.max = info->max_index;
 	pool->limit.min = info->min_index;
 }
 
+/* runs single threaded at driver shutdown */
 void rxe_pool_cleanup(struct rxe_pool *pool)
 {
 	struct rxe_pool_elem *elem;
@@ -204,36 +201,42 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 {
 	struct rxe_pool_elem *elem;
-	void *obj;
+	void *obj = NULL;
 
-	read_lock_bh(&pool->pool_lock);
+	rcu_read_lock();
 	elem = xa_load(&pool->xa, index);
 	if (elem && kref_get_unless_zero(&elem->ref_cnt))
 		obj = elem->obj;
-	else
-		obj = NULL;
-	read_unlock_bh(&pool->pool_lock);
+	rcu_read_unlock();
 
 	return obj;
 }
 
-static void rxe_elem_release(struct kref *kref)
+static void rxe_obj_free_rcu(struct rcu_head *rcu)
 {
-	struct rxe_pool_elem *elem =
-		container_of(kref, struct rxe_pool_elem, ref_cnt);
+	struct rxe_pool_elem *elem = container_of(rcu, typeof(*elem), rcu);
+
+	kfree(elem->obj);
+}
+
+static void __rxe_elem_release_rcu(struct kref *kref)
+	__releases(&pool->xa.xa_lock)
+{
+	struct rxe_pool_elem *elem = container_of(kref,
+					struct rxe_pool_elem, ref_cnt);
 	struct rxe_pool *pool = elem->pool;
-	void *obj;
+
+	__xa_erase(&pool->xa, elem->index);
+
+	spin_unlock(&pool->xa.xa_lock);
 
 	if (pool->cleanup)
 		pool->cleanup(elem);
 
-	if (pool->flags & RXE_POOL_ALLOC) {
-		obj = elem->obj;
-		kfree(obj);
-	}
-
-	xa_erase(&pool->xa, elem->index);
 	atomic_dec(&pool->num_elem);
+
+	if (pool->flags & RXE_POOL_ALLOC)
+		call_rcu(&elem->rcu, rxe_obj_free_rcu);
 }
 
 int __rxe_add_ref(struct rxe_pool_elem *elem)
@@ -243,5 +246,6 @@ int __rxe_add_ref(struct rxe_pool_elem *elem)
 
 int __rxe_drop_ref(struct rxe_pool_elem *elem)
 {
-	return kref_put(&elem->ref_cnt, rxe_elem_release);
+	return kref_put_lock(&elem->ref_cnt, __rxe_elem_release_rcu,
+			&elem->pool->xa.xa_lock);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index c985ed519066..40026d746563 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -8,8 +8,7 @@
 #define RXE_POOL_H
 
 enum rxe_pool_flags {
-	RXE_POOL_INDEX		= BIT(1),
-	RXE_POOL_ALLOC		= BIT(2),
+	RXE_POOL_ALLOC		= BIT(1),
 };
 
 enum rxe_elem_type {
@@ -29,13 +28,13 @@ struct rxe_pool_elem {
 	void			*obj;
 	struct kref		ref_cnt;
 	struct list_head	list;
+	struct rcu_head		rcu;
 	u32			index;
 };
 
 struct rxe_pool {
 	struct rxe_dev		*rxe;
 	const char		*name;
-	rwlock_t		pool_lock; /* protects pool add/del/search */
 	void			(*cleanup)(struct rxe_pool_elem *elem);
 	enum rxe_pool_flags	flags;
 	enum rxe_elem_type	type;
@@ -48,38 +47,24 @@ struct rxe_pool {
 	struct xarray		xa;
 	struct xa_limit		limit;
 	u32			next;
-	int			locked;	/* ?? */
 };
 
-/* initialize a pool of objects with given limit on
- * number of elements. gets parameters from rxe_type_info
- * pool elements will be allocated out of a slab cache
- */
 void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
 		  enum rxe_elem_type type, u32 max_elem);
 
-/* free resources from object pool */
 void rxe_pool_cleanup(struct rxe_pool *pool);
 
-/* allocate an object from pool */
 void *rxe_alloc(struct rxe_pool *pool);
 
-/* connect already allocated object to pool */
 int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem);
-
 #define rxe_add_to_pool(pool, obj) __rxe_add_to_pool(pool, &(obj)->elem)
 
-/* lookup an indexed object from index. takes a reference on object */
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index);
 
-/* take a reference on an object */
 int __rxe_add_ref(struct rxe_pool_elem *elem);
-
 #define rxe_add_ref(obj) __rxe_add_ref(&(obj)->elem)
 
-/* drop a reference on an object */
 int __rxe_drop_ref(struct rxe_pool_elem *elem);
-
 #define rxe_drop_ref(obj) __rxe_drop_ref(&(obj)->elem)
 
 #endif /* RXE_POOL_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 12bff190fc1f..d70d44392c32 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -309,6 +309,7 @@ static inline int rkey_is_mw(u32 rkey)
 struct rxe_mr {
 	struct rxe_pool_elem	elem;
 	struct ib_mr		ibmr;
+	struct rcu_head		rcu;
 
 	struct ib_umem		*umem;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 24/26] RDMA/rxe: Add wait_for_completion to pool objects
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (22 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 23/26] RDMA/rxe: Change pool locking to RCU Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-28  3:58     ` kernel test robot
  2022-01-27 21:37 ` [RFC PATCH v9 25/26] RDMA/rxe: Fix ref error in rxe_av.c Bob Pearson
                   ` (2 subsequent siblings)
  26 siblings, 1 reply; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Reference counting for object deletion can cause an object to
wait for something else to happen before an object gets deleted.
The destroy verbs can then return to rdma-core with the object still
holding references. Adding wait_for_completion in this path
prevents this.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mr.c    |  1 +
 drivers/infiniband/sw/rxe/rxe_mw.c    |  3 +-
 drivers/infiniband/sw/rxe/rxe_pool.c  | 79 ++++++++++++++++++++++-----
 drivers/infiniband/sw/rxe/rxe_pool.h  |  4 ++
 drivers/infiniband/sw/rxe/rxe_verbs.c | 11 ++++
 5 files changed, 84 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 35628b8a00b4..6d1ce05bcf65 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -692,6 +692,7 @@ int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 	mr->state = RXE_MR_STATE_INVALID;
 	rxe_drop_ref(mr_pd(mr));
 	rxe_drop_ref(mr);
+	rxe_wait(mr);
 
 	return 0;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c
index 7df36c40eec2..dd3d02db3d03 100644
--- a/drivers/infiniband/sw/rxe/rxe_mw.c
+++ b/drivers/infiniband/sw/rxe/rxe_mw.c
@@ -60,8 +60,9 @@ int rxe_dealloc_mw(struct ib_mw *ibmw)
 	rxe_do_dealloc_mw(mw);
 	spin_unlock_bh(&mw->lock);
 
-	rxe_drop_ref(mw);
 	rxe_drop_ref(pd);
+	rxe_drop_ref(mw);
+	rxe_wait(mw);
 
 	return 0;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 18cdf5e0ad4e..5402dae01554 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -7,6 +7,7 @@
 
 #include "rxe.h"
 
+#define RXE_POOL_TIMEOUT	(200)
 #define RXE_POOL_ALIGN		(16)
 
 static const struct rxe_type_info {
@@ -154,6 +155,7 @@ void *rxe_alloc(struct rxe_pool *pool)
 	elem->pool = pool;
 	elem->obj = obj;
 	kref_init(&elem->ref_cnt);
+	init_completion(&elem->complete);
 
 	err = xa_alloc_cyclic_bh(&pool->xa, &elem->index, elem, pool->limit,
 			&pool->next, GFP_KERNEL);
@@ -185,6 +187,7 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 	elem->pool = pool;
 	elem->obj = (u8 *)elem - pool->elem_offset;
 	kref_init(&elem->ref_cnt);
+	init_completion(&elem->complete);
 
 	err = xa_alloc_cyclic_bh(&pool->xa, &elem->index, elem, pool->limit,
 			&pool->next, GFP_KERNEL);
@@ -212,31 +215,22 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 	return obj;
 }
 
-static void rxe_obj_free_rcu(struct rcu_head *rcu)
-{
-	struct rxe_pool_elem *elem = container_of(rcu, typeof(*elem), rcu);
-
-	kfree(elem->obj);
-}
-
 static void __rxe_elem_release_rcu(struct kref *kref)
 	__releases(&pool->xa.xa_lock)
 {
-	struct rxe_pool_elem *elem = container_of(kref,
-					struct rxe_pool_elem, ref_cnt);
+	struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt);
 	struct rxe_pool *pool = elem->pool;
 
 	__xa_erase(&pool->xa, elem->index);
 
-	spin_unlock(&pool->xa.xa_lock);
+	spin_unlock_bh(&pool->xa.xa_lock);
 
 	if (pool->cleanup)
 		pool->cleanup(elem);
 
 	atomic_dec(&pool->num_elem);
 
-	if (pool->flags & RXE_POOL_ALLOC)
-		call_rcu(&elem->rcu, rxe_obj_free_rcu);
+	complete(&elem->complete);
 }
 
 int __rxe_add_ref(struct rxe_pool_elem *elem)
@@ -244,8 +238,67 @@ int __rxe_add_ref(struct rxe_pool_elem *elem)
 	return kref_get_unless_zero(&elem->ref_cnt);
 }
 
+static bool refcount_dec_and_lock_bh(refcount_t *r, spinlock_t *lock)
+	__acquires(lock) __releases(lock)
+{
+	if (refcount_dec_not_one(r))
+		return false;
+
+	spin_lock_bh(lock);
+	if (!refcount_dec_and_test(r)) {
+		spin_unlock_bh(lock);
+		return false;
+	}
+
+	return true;
+}
+
+static int kref_put_lock_bh(struct kref *kref,
+				void (*release)(struct kref *kref),
+				spinlock_t *lock)
+{
+	if (refcount_dec_and_lock_bh(&kref->refcount, lock)) {
+		release(kref);
+		return 1;
+	}
+	return 0;
+}
+
 int __rxe_drop_ref(struct rxe_pool_elem *elem)
 {
-	return kref_put_lock(&elem->ref_cnt, __rxe_elem_release_rcu,
+	return kref_put_lock_bh(&elem->ref_cnt, __rxe_elem_release_rcu,
 			&elem->pool->xa.xa_lock);
 }
+
+static void rxe_obj_free_rcu(struct rcu_head *rcu)
+{
+	struct rxe_pool_elem *elem = container_of(rcu, typeof(*elem), rcu);
+
+	kfree(elem->obj);
+}
+
+int __rxe_wait(struct rxe_pool_elem *elem)
+{
+	struct rxe_pool *pool = elem->pool;
+	static int timeout = RXE_POOL_TIMEOUT;
+	static int timeout_failures;
+	int ret;
+
+	if (timeout) {
+		ret = wait_for_completion_timeout(&elem->complete, timeout);
+		if (!ret) {
+			if (timeout_failures++ == 5) {
+				timeout = 0;
+				pr_warn("Exceeded max completion timeouts. Disabling wait_for_completion\n");
+			} else {
+				pr_warn_ratelimited("Timed out waiting for %s#%d to complete\n",
+					pool->name + 4, elem->index);
+			}
+		}
+	}
+
+	if (pool->flags & RXE_POOL_ALLOC)
+		call_rcu(&elem->rcu, rxe_obj_free_rcu);
+
+	return ret;
+}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 40026d746563..f085750c4c5a 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -29,6 +29,7 @@ struct rxe_pool_elem {
 	struct kref		ref_cnt;
 	struct list_head	list;
 	struct rcu_head		rcu;
+	struct completion	complete;
 	u32			index;
 };
 
@@ -67,4 +68,7 @@ int __rxe_add_ref(struct rxe_pool_elem *elem);
 int __rxe_drop_ref(struct rxe_pool_elem *elem);
 #define rxe_drop_ref(obj) __rxe_drop_ref(&(obj)->elem)
 
+int __rxe_wait(struct rxe_pool_elem *elem);
+#define rxe_wait(obj) __rxe_wait(&(obj)->elem)
+
 #endif /* RXE_POOL_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 3ca374f1cf9b..f2c1037696c5 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -116,6 +116,7 @@ static void rxe_dealloc_ucontext(struct ib_ucontext *ibuc)
 	struct rxe_ucontext *uc = to_ruc(ibuc);
 
 	rxe_drop_ref(uc);
+	rxe_wait(uc);
 }
 
 static int rxe_port_immutable(struct ib_device *dev, u32 port_num,
@@ -150,6 +151,7 @@ static int rxe_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
 	struct rxe_pd *pd = to_rpd(ibpd);
 
 	rxe_drop_ref(pd);
+	rxe_wait(pd);
 	return 0;
 }
 
@@ -189,6 +191,7 @@ static int rxe_create_ah(struct ib_ah *ibah,
 					 sizeof(uresp->ah_num));
 		if (err) {
 			rxe_drop_ref(ah);
+			rxe_wait(ah);
 			return -EFAULT;
 		}
 	} else if (ah->is_user) {
@@ -229,6 +232,7 @@ static int rxe_destroy_ah(struct ib_ah *ibah, u32 flags)
 	struct rxe_ah *ah = to_rah(ibah);
 
 	rxe_drop_ref(ah);
+	rxe_wait(ah);
 	return 0;
 }
 
@@ -315,6 +319,7 @@ static int rxe_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init,
 err2:
 	rxe_drop_ref(pd);
 	rxe_drop_ref(srq);
+	rxe_wait(srq);
 err1:
 	return err;
 }
@@ -373,6 +378,7 @@ static int rxe_destroy_srq(struct ib_srq *ibsrq, struct ib_udata *udata)
 
 	rxe_drop_ref(srq->pd);
 	rxe_drop_ref(srq);
+	rxe_wait(srq);
 	return 0;
 }
 
@@ -442,6 +448,7 @@ static int rxe_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init,
 
 qp_init:
 	rxe_drop_ref(qp);
+	rxe_wait(qp);
 	return err;
 }
 
@@ -496,6 +503,7 @@ static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 
 	rxe_qp_destroy(qp);
 	rxe_drop_ref(qp);
+	rxe_wait(qp);
 	return 0;
 }
 
@@ -807,6 +815,7 @@ static int rxe_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 	rxe_cq_disable(cq);
 
 	rxe_drop_ref(cq);
+	rxe_wait(cq);
 	return 0;
 }
 
@@ -932,6 +941,7 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 err3:
 	rxe_drop_ref(pd);
 	rxe_drop_ref(mr);
+	rxe_wait(mr);
 err2:
 	return ERR_PTR(err);
 }
@@ -964,6 +974,7 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
 err2:
 	rxe_drop_ref(pd);
 	rxe_drop_ref(mr);
+	rxe_wait(mr);
 err1:
 	return ERR_PTR(err);
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 25/26] RDMA/rxe: Fix ref error in rxe_av.c
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (23 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 24/26] RDMA/rxe: Add wait_for_completion to pool objects Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-27 21:37 ` [RFC PATCH v9 26/26] RDMA/rxe: Replace mr by rkey in responder resources Bob Pearson
  2022-01-28 18:42 ` [RFC PATCH v9 00/26] Jason Gunthorpe
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

The commit referenced below can take a reference to the AH which is
never dropped. This only happens in the UD request path. This patch
optionally passes that AH back to the caller so that it can hold the
reference while the AV is being accessed and then drop it. Code to
do this is added to rxe_req.c. The AV is also passed to rxe_prepare
in rxe_net.c as an optimization.

Fixes: e2fe06c90806 ("RDMA/rxe: Lookup kernel AH from ah index in UD WQEs")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_av.c   | 19 +++++++++-
 drivers/infiniband/sw/rxe/rxe_loc.h  |  5 ++-
 drivers/infiniband/sw/rxe/rxe_net.c  | 17 +++++----
 drivers/infiniband/sw/rxe/rxe_req.c  | 55 +++++++++++++++++-----------
 drivers/infiniband/sw/rxe/rxe_resp.c |  2 +-
 5 files changed, 63 insertions(+), 35 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c
index 38c7b6fb39d7..360a567159fe 100644
--- a/drivers/infiniband/sw/rxe/rxe_av.c
+++ b/drivers/infiniband/sw/rxe/rxe_av.c
@@ -99,11 +99,14 @@ void rxe_av_fill_ip_info(struct rxe_av *av, struct rdma_ah_attr *attr)
 	av->network_type = type;
 }
 
-struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt)
+struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt, struct rxe_ah **ahp)
 {
 	struct rxe_ah *ah;
 	u32 ah_num;
 
+	if (ahp)
+		*ahp = NULL;
+
 	if (!pkt || !pkt->qp)
 		return NULL;
 
@@ -117,10 +120,22 @@ struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt)
 	if (ah_num) {
 		/* only new user provider or kernel client */
 		ah = rxe_pool_get_index(&pkt->rxe->ah_pool, ah_num);
-		if (!ah || ah->ah_num != ah_num || rxe_ah_pd(ah) != pkt->qp->pd) {
+		if (!ah) {
 			pr_warn("Unable to find AH matching ah_num\n");
 			return NULL;
 		}
+
+		if (rxe_ah_pd(ah) != pkt->qp->pd) {
+			pr_warn("PDs don't match for AH and QP\n");
+			rxe_drop_ref(ah);
+			return NULL;
+		}
+
+		if (ahp)
+			*ahp = ah;
+		else
+			rxe_drop_ref(ah);
+
 		return &ah->av;
 	}
 
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 0bc1b7e2877c..31a052c5d5f8 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -19,7 +19,7 @@ void rxe_av_to_attr(struct rxe_av *av, struct rdma_ah_attr *attr);
 
 void rxe_av_fill_ip_info(struct rxe_av *av, struct rdma_ah_attr *attr);
 
-struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt);
+struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt, struct rxe_ah **ahp);
 
 /* rxe_cq.c */
 int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq,
@@ -95,7 +95,8 @@ void rxe_mw_cleanup(struct rxe_pool_elem *arg);
 /* rxe_net.c */
 struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
 				int paylen, struct rxe_pkt_info *pkt);
-int rxe_prepare(struct rxe_pkt_info *pkt, struct sk_buff *skb);
+int rxe_prepare(struct rxe_av *av, struct rxe_pkt_info *pkt,
+		struct sk_buff *skb);
 int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 		    struct sk_buff *skb);
 const char *rxe_parent_name(struct rxe_dev *rxe, unsigned int port_num);
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index a8cfa7160478..b06f22ffc5a8 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -271,13 +271,13 @@ static void prepare_ipv6_hdr(struct dst_entry *dst, struct sk_buff *skb,
 	ip6h->payload_len = htons(skb->len - sizeof(*ip6h));
 }
 
-static int prepare4(struct rxe_pkt_info *pkt, struct sk_buff *skb)
+static int prepare4(struct rxe_av *av, struct rxe_pkt_info *pkt,
+		    struct sk_buff *skb)
 {
 	struct rxe_qp *qp = pkt->qp;
 	struct dst_entry *dst;
 	bool xnet = false;
 	__be16 df = htons(IP_DF);
-	struct rxe_av *av = rxe_get_av(pkt);
 	struct in_addr *saddr = &av->sgid_addr._sockaddr_in.sin_addr;
 	struct in_addr *daddr = &av->dgid_addr._sockaddr_in.sin_addr;
 
@@ -297,11 +297,11 @@ static int prepare4(struct rxe_pkt_info *pkt, struct sk_buff *skb)
 	return 0;
 }
 
-static int prepare6(struct rxe_pkt_info *pkt, struct sk_buff *skb)
+static int prepare6(struct rxe_av *av, struct rxe_pkt_info *pkt,
+		    struct sk_buff *skb)
 {
 	struct rxe_qp *qp = pkt->qp;
 	struct dst_entry *dst;
-	struct rxe_av *av = rxe_get_av(pkt);
 	struct in6_addr *saddr = &av->sgid_addr._sockaddr_in6.sin6_addr;
 	struct in6_addr *daddr = &av->dgid_addr._sockaddr_in6.sin6_addr;
 
@@ -322,16 +322,17 @@ static int prepare6(struct rxe_pkt_info *pkt, struct sk_buff *skb)
 	return 0;
 }
 
-int rxe_prepare(struct rxe_pkt_info *pkt, struct sk_buff *skb)
+int rxe_prepare(struct rxe_av *av, struct rxe_pkt_info *pkt,
+		struct sk_buff *skb)
 {
 	int err = 0;
 
 	if (skb->protocol == htons(ETH_P_IP))
-		err = prepare4(pkt, skb);
+		err = prepare4(av, pkt, skb);
 	else if (skb->protocol == htons(ETH_P_IPV6))
-		err = prepare6(pkt, skb);
+		err = prepare6(av, pkt, skb);
 
-	if (ether_addr_equal(skb->dev->dev_addr, rxe_get_av(pkt)->dmac))
+	if (ether_addr_equal(skb->dev->dev_addr, av->dmac))
 		pkt->mask |= RXE_LOOPBACK_MASK;
 
 	return err;
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 5eb89052dd66..f44535f82bea 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -358,6 +358,7 @@ static inline int get_mtu(struct rxe_qp *qp)
 }
 
 static struct sk_buff *init_req_packet(struct rxe_qp *qp,
+				       struct rxe_av *av,
 				       struct rxe_send_wqe *wqe,
 				       int opcode, int payload,
 				       struct rxe_pkt_info *pkt)
@@ -365,7 +366,6 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
 	struct rxe_dev		*rxe = to_rdev(qp->ibqp.device);
 	struct sk_buff		*skb;
 	struct rxe_send_wr	*ibwr = &wqe->wr;
-	struct rxe_av		*av;
 	int			pad = (-payload) & 0x3;
 	int			paylen;
 	int			solicited;
@@ -374,21 +374,9 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
 
 	/* length from start of bth to end of icrc */
 	paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE;
-
-	/* pkt->hdr, port_num and mask are initialized in ifc layer */
-	pkt->rxe	= rxe;
-	pkt->opcode	= opcode;
-	pkt->qp		= qp;
-	pkt->psn	= qp->req.psn;
-	pkt->mask	= rxe_opcode[opcode].mask;
-	pkt->paylen	= paylen;
-	pkt->wqe	= wqe;
+	pkt->paylen = paylen;
 
 	/* init skb */
-	av = rxe_get_av(pkt);
-	if (!av)
-		return NULL;
-
 	skb = rxe_init_packet(rxe, av, paylen, pkt);
 	if (unlikely(!skb))
 		return NULL;
@@ -447,13 +435,13 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
 	return skb;
 }
 
-static int finish_packet(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
-		       struct rxe_pkt_info *pkt, struct sk_buff *skb,
-		       int paylen)
+static int finish_packet(struct rxe_qp *qp, struct rxe_av *av,
+			 struct rxe_send_wqe *wqe, struct rxe_pkt_info *pkt,
+			 struct sk_buff *skb, int paylen)
 {
 	int err;
 
-	err = rxe_prepare(pkt, skb);
+	err = rxe_prepare(av, pkt, skb);
 	if (err)
 		return err;
 
@@ -608,6 +596,7 @@ static int rxe_do_local_ops(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 int rxe_requester(void *arg)
 {
 	struct rxe_qp *qp = (struct rxe_qp *)arg;
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
 	struct rxe_pkt_info pkt;
 	struct sk_buff *skb;
 	struct rxe_send_wqe *wqe;
@@ -619,6 +608,8 @@ int rxe_requester(void *arg)
 	struct rxe_send_wqe rollback_wqe;
 	u32 rollback_psn;
 	struct rxe_queue *q = qp->sq.queue;
+	struct rxe_ah *ah;
+	struct rxe_av *av;
 
 	rxe_add_ref(qp);
 
@@ -705,14 +696,28 @@ int rxe_requester(void *arg)
 		payload = mtu;
 	}
 
-	skb = init_req_packet(qp, wqe, opcode, payload, &pkt);
+	pkt.rxe = rxe;
+	pkt.opcode = opcode;
+	pkt.qp = qp;
+	pkt.psn = qp->req.psn;
+	pkt.mask = rxe_opcode[opcode].mask;
+	pkt.wqe = wqe;
+
+	av = rxe_get_av(&pkt, &ah);
+	if (unlikely(!av)) {
+		pr_err("qp#%d Failed no address vector\n", qp_num(qp));
+		wqe->status = IB_WC_LOC_QP_OP_ERR;
+		goto err_drop_ah;
+	}
+
+	skb = init_req_packet(qp, av, wqe, opcode, payload, &pkt);
 	if (unlikely(!skb)) {
 		pr_err("qp#%d Failed allocating skb\n", qp_num(qp));
 		wqe->status = IB_WC_LOC_QP_OP_ERR;
-		goto err;
+		goto err_drop_ah;
 	}
 
-	ret = finish_packet(qp, wqe, &pkt, skb, payload);
+	ret = finish_packet(qp, av, wqe, &pkt, skb, payload);
 	if (unlikely(ret)) {
 		pr_debug("qp#%d Error during finish packet\n", qp_num(qp));
 		if (ret == -EFAULT)
@@ -720,9 +725,12 @@ int rxe_requester(void *arg)
 		else
 			wqe->status = IB_WC_LOC_QP_OP_ERR;
 		kfree_skb(skb);
-		goto err;
+		goto err_drop_ah;
 	}
 
+	if (ah)
+		rxe_drop_ref(ah);
+
 	/*
 	 * To prevent a race on wqe access between requester and completer,
 	 * wqe members state and psn need to be set before calling
@@ -751,6 +759,9 @@ int rxe_requester(void *arg)
 
 	goto next_wqe;
 
+err_drop_ah:
+	if (ah)
+		rxe_drop_ref(ah);
 err:
 	wqe->state = wqe_state_error;
 	__rxe_do_task(&qp->comp.task);
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index e8f435fa6e4d..f589f4dde35c 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -632,7 +632,7 @@ static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp,
 	if (ack->mask & RXE_ATMACK_MASK)
 		atmack_set_orig(ack, qp->resp.atomic_orig);
 
-	err = rxe_prepare(ack, skb);
+	err = rxe_prepare(&qp->pri_av, ack, skb);
 	if (err) {
 		kfree_skb(skb);
 		return NULL;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC PATCH v9 26/26] RDMA/rxe: Replace mr by rkey in responder resources
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (24 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 25/26] RDMA/rxe: Fix ref error in rxe_av.c Bob Pearson
@ 2022-01-27 21:37 ` Bob Pearson
  2022-01-28 18:42 ` [RFC PATCH v9 00/26] Jason Gunthorpe
  26 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-27 21:37 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently rxe saves a copy of MR in responder resources for RDMA reads.
Since the responder resources are never freed just over written if
more are needed this MR may not have a reference freed until the QP
is destroyed. This patch uses the rkey instead of the MR and on
subsequent packets of a multipacket read reply message it looks up the
MR from the rkey for each packet. This makes it possible for a user
to deregister an MR or unbind a MW on the fly and get correct behaviour.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c    |  10 +--
 drivers/infiniband/sw/rxe/rxe_resp.c  | 123 ++++++++++++++++++--------
 drivers/infiniband/sw/rxe/rxe_verbs.h |   1 -
 3 files changed, 87 insertions(+), 47 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 742073ce0709..c595a140e893 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -135,12 +135,8 @@ static void free_rd_atomic_resources(struct rxe_qp *qp)
 
 void free_rd_atomic_resource(struct rxe_qp *qp, struct resp_res *res)
 {
-	if (res->type == RXE_ATOMIC_MASK) {
+	if (res->type == RXE_ATOMIC_MASK)
 		kfree_skb(res->atomic.skb);
-	} else if (res->type == RXE_READ_MASK) {
-		if (res->read.mr)
-			rxe_drop_ref(res->read.mr);
-	}
 	res->type = 0;
 }
 
@@ -825,10 +821,8 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
 	if (qp->pd)
 		rxe_drop_ref(qp->pd);
 
-	if (qp->resp.mr) {
+	if (qp->resp.mr)
 		rxe_drop_ref(qp->resp.mr);
-		qp->resp.mr = NULL;
-	}
 
 	if (qp_type(qp) == IB_QPT_RC)
 		sk_dst_reset(qp->sk->sk);
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index f589f4dde35c..c776289842e5 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -641,6 +641,78 @@ static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp,
 	return skb;
 }
 
+static struct resp_res *rxe_prepare_read_res(struct rxe_qp *qp,
+					struct rxe_pkt_info *pkt)
+{
+	struct resp_res *res;
+	u32 pkts;
+
+	res = &qp->resp.resources[qp->resp.res_head];
+	rxe_advance_resp_resource(qp);
+	free_rd_atomic_resource(qp, res);
+
+	res->type = RXE_READ_MASK;
+	res->replay = 0;
+	res->read.va = qp->resp.va + qp->resp.offset;
+	res->read.va_org = qp->resp.va + qp->resp.offset;
+	res->read.resid = qp->resp.resid;
+	res->read.length = qp->resp.resid;
+	res->read.rkey = qp->resp.rkey;
+
+	pkts = max_t(u32, (reth_len(pkt) + qp->mtu - 1)/qp->mtu, 1);
+	res->first_psn = pkt->psn;
+	res->cur_psn = pkt->psn;
+	res->last_psn = (pkt->psn + pkts - 1) & BTH_PSN_MASK;
+
+	res->state = rdatm_res_state_new;
+
+	return res;
+}
+
+/**
+ * rxe_recheck_mr - revalidate MR from rkey and get a reference
+ * @qp: the qp
+ * @rkey: the rkey
+ *
+ * This code allows the MR to be invalidated or deregistered or
+ * the MW if one was used to be invalidated or deallocated.
+ * It is assumed that the access permissions if originally good
+ * are OK and the mappings to be unchanged.
+ *
+ * Return: mr on success else NULL
+ */
+static struct rxe_mr *rxe_recheck_mr(struct rxe_qp *qp, u32 rkey)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	struct rxe_mr *mr;
+	struct rxe_mw *mw;
+
+	if (rkey_is_mw(rkey)) {
+		mw = rxe_pool_get_index(&rxe->mw_pool, rkey >> 8);
+		if (!mw || mw->rkey != rkey)
+			return NULL;
+
+		if (mw->state != RXE_MW_STATE_VALID) {
+			rxe_drop_ref(mw);
+			return NULL;
+		}
+
+		mr = mw->mr;
+		rxe_drop_ref(mw);
+	} else {
+		mr = rxe_pool_get_index(&rxe->mr_pool, rkey >> 8);
+		if (!mr || mr->rkey != rkey)
+			return NULL;
+	}
+
+	if (mr->state != RXE_MR_STATE_VALID) {
+		rxe_drop_ref(mr);
+		return NULL;
+	}
+
+	return mr;
+}
+
 /* RDMA read response. If res is not NULL, then we have a current RDMA request
  * being processed or replayed.
  */
@@ -655,53 +727,26 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 	int opcode;
 	int err;
 	struct resp_res *res = qp->resp.res;
+	struct rxe_mr *mr;
 
 	if (!res) {
-		/* This is the first time we process that request. Get a
-		 * resource
-		 */
-		res = &qp->resp.resources[qp->resp.res_head];
-
-		free_rd_atomic_resource(qp, res);
-		rxe_advance_resp_resource(qp);
-
-		res->type		= RXE_READ_MASK;
-		res->replay		= 0;
-
-		res->read.va		= qp->resp.va +
-					  qp->resp.offset;
-		res->read.va_org	= qp->resp.va +
-					  qp->resp.offset;
-
-		res->first_psn		= req_pkt->psn;
-
-		if (reth_len(req_pkt)) {
-			res->last_psn	= (req_pkt->psn +
-					   (reth_len(req_pkt) + mtu - 1) /
-					   mtu - 1) & BTH_PSN_MASK;
-		} else {
-			res->last_psn	= res->first_psn;
-		}
-		res->cur_psn		= req_pkt->psn;
-
-		res->read.resid		= qp->resp.resid;
-		res->read.length	= qp->resp.resid;
-		res->read.rkey		= qp->resp.rkey;
-
-		/* note res inherits the reference to mr from qp */
-		res->read.mr		= qp->resp.mr;
-		qp->resp.mr		= NULL;
-
-		qp->resp.res		= res;
-		res->state		= rdatm_res_state_new;
+		res = rxe_prepare_read_res(qp, req_pkt);
+		qp->resp.res = res;
 	}
 
 	if (res->state == rdatm_res_state_new) {
+		mr = qp->resp.mr;
+		qp->resp.mr = NULL;
+
 		if (res->read.resid <= mtu)
 			opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY;
 		else
 			opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST;
 	} else {
+		mr = rxe_recheck_mr(qp, res->read.rkey);
+		if (!mr)
+			return RESPST_ERR_RKEY_VIOLATION;
+
 		if (res->read.resid > mtu)
 			opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE;
 		else
@@ -717,10 +762,12 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 	if (!skb)
 		return RESPST_ERR_RNR;
 
-	err = rxe_mr_copy(res->read.mr, res->read.va, payload_addr(&ack_pkt),
+	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
 			  payload, RXE_FROM_MR_OBJ);
 	if (err)
 		pr_err("Failed copying memory\n");
+	if (mr)
+		rxe_drop_ref(mr);
 
 	if (bth_pad(&ack_pkt)) {
 		u8 *pad = payload_addr(&ack_pkt) + payload;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index d70d44392c32..81996e5af079 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -157,7 +157,6 @@ struct resp_res {
 			struct sk_buff	*skb;
 		} atomic;
 		struct {
-			struct rxe_mr	*mr;
 			u64		va_org;
 			u32		rkey;
 			u32		length;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v9 24/26] RDMA/rxe: Add wait_for_completion to pool objects
  2022-01-27 21:37 ` [RFC PATCH v9 24/26] RDMA/rxe: Add wait_for_completion to pool objects Bob Pearson
@ 2022-01-28  3:58     ` kernel test robot
  0 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2022-01-28  3:58 UTC (permalink / raw)
  To: Bob Pearson; +Cc: llvm, kbuild-all

Hi Bob,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on rdma/for-next]
[also build test WARNING on v5.17-rc1 next-20220127]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Bob-Pearson/RDMA-rxe-Move-rxe_mcast_add-delete-to-rxe_mcast-c/20220128-063757
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next
config: x86_64-randconfig-a016-20220124 (https://download.01.org/0day-ci/archive/20220128/202201281100.RzpcbUSw-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 33b45ee44b1f32ffdbc995e6fec806271b4b3ba4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/0b97a20ee58915580069267b6e4ec2e643dc12b5
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Bob-Pearson/RDMA-rxe-Move-rxe_mcast_add-delete-to-rxe_mcast-c/20220128-063757
        git checkout 0b97a20ee58915580069267b6e4ec2e643dc12b5
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/infiniband/sw/rxe/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/infiniband/sw/rxe/rxe_pool.c:287:2: warning: variable 'ret' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
           if (timeout) {
           ^~~~~~~~~~~~
   include/linux/compiler.h:56:28: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:58:30: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/infiniband/sw/rxe/rxe_pool.c:303:9: note: uninitialized use occurs here
           return ret;
                  ^~~
   drivers/infiniband/sw/rxe/rxe_pool.c:287:2: note: remove the 'if' if its condition is always true
           if (timeout) {
           ^~~~~~~~~~~~~
   include/linux/compiler.h:56:23: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                         ^
   drivers/infiniband/sw/rxe/rxe_pool.c:285:9: note: initialize the variable 'ret' to silence this warning
           int ret;
                  ^
                   = 0
   1 warning generated.


vim +287 drivers/infiniband/sw/rxe/rxe_pool.c

   279	
   280	int __rxe_wait(struct rxe_pool_elem *elem)
   281	{
   282		struct rxe_pool *pool = elem->pool;
   283		static int timeout = RXE_POOL_TIMEOUT;
   284		static int timeout_failures;
   285		int ret;
   286	
 > 287		if (timeout) {

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v9 24/26] RDMA/rxe: Add wait_for_completion to pool objects
@ 2022-01-28  3:58     ` kernel test robot
  0 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2022-01-28  3:58 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3484 bytes --]

Hi Bob,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on rdma/for-next]
[also build test WARNING on v5.17-rc1 next-20220127]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Bob-Pearson/RDMA-rxe-Move-rxe_mcast_add-delete-to-rxe_mcast-c/20220128-063757
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next
config: x86_64-randconfig-a016-20220124 (https://download.01.org/0day-ci/archive/20220128/202201281100.RzpcbUSw-lkp(a)intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 33b45ee44b1f32ffdbc995e6fec806271b4b3ba4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/0b97a20ee58915580069267b6e4ec2e643dc12b5
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Bob-Pearson/RDMA-rxe-Move-rxe_mcast_add-delete-to-rxe_mcast-c/20220128-063757
        git checkout 0b97a20ee58915580069267b6e4ec2e643dc12b5
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/infiniband/sw/rxe/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/infiniband/sw/rxe/rxe_pool.c:287:2: warning: variable 'ret' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
           if (timeout) {
           ^~~~~~~~~~~~
   include/linux/compiler.h:56:28: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:58:30: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/infiniband/sw/rxe/rxe_pool.c:303:9: note: uninitialized use occurs here
           return ret;
                  ^~~
   drivers/infiniband/sw/rxe/rxe_pool.c:287:2: note: remove the 'if' if its condition is always true
           if (timeout) {
           ^~~~~~~~~~~~~
   include/linux/compiler.h:56:23: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
                         ^
   drivers/infiniband/sw/rxe/rxe_pool.c:285:9: note: initialize the variable 'ret' to silence this warning
           int ret;
                  ^
                   = 0
   1 warning generated.


vim +287 drivers/infiniband/sw/rxe/rxe_pool.c

   279	
   280	int __rxe_wait(struct rxe_pool_elem *elem)
   281	{
   282		struct rxe_pool *pool = elem->pool;
   283		static int timeout = RXE_POOL_TIMEOUT;
   284		static int timeout_failures;
   285		int ret;
   286	
 > 287		if (timeout) {

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v9 04/26] RDMA/rxe: Enforce IBA o10-2.2.3
  2022-01-27 21:37 ` [RFC PATCH v9 04/26] RDMA/rxe: Enforce IBA o10-2.2.3 Bob Pearson
@ 2022-01-28 12:53   ` Jason Gunthorpe
  2022-01-28 16:18     ` Bob Pearson
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2022-01-28 12:53 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Thu, Jan 27, 2022 at 03:37:33PM -0600, Bob Pearson wrote:
> Add code to check if a QP is attached to one or more multicast groups
> when destroy_qp is called and return an error if so.

The core code already does some of this anyhow..

> diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
> index 949784198d80..34e3c52f0b72 100644
> +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
> @@ -114,6 +114,7 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
>  	grp->num_qp++;
>  	elem->qp = qp;
>  	elem->grp = grp;
> +	atomic_inc(&qp->mcg_num);

eg what prevents qp from being concurrently destroyed here?

The core code because it doesn't allow a multicast group to be added
concurrently with destruction of a QP.

> +int rxe_qp_chk_destroy(struct rxe_qp *qp)
> +{
> +	/* See IBA o10-2.2.3
> +	 * An attempt to destroy a QP while attached to a mcast group
> +	 * will fail immediately.
> +	 */
> +	if (atomic_read(&qp->mcg_num)) {
> +		pr_warn_once("Attempt to destroy QP while attached to multicast group\n");
> +		return -EBUSY;

Don't print

But yes, I think drivers are expected to do this, though most likely
this is already happening for other reasons and this is mearly
protective against bugs.

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v9 04/26] RDMA/rxe: Enforce IBA o10-2.2.3
  2022-01-28 12:53   ` Jason Gunthorpe
@ 2022-01-28 16:18     ` Bob Pearson
  2022-01-28 16:42       ` Jason Gunthorpe
  0 siblings, 1 reply; 41+ messages in thread
From: Bob Pearson @ 2022-01-28 16:18 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 1/28/22 06:53, Jason Gunthorpe wrote:
> On Thu, Jan 27, 2022 at 03:37:33PM -0600, Bob Pearson wrote:
>> Add code to check if a QP is attached to one or more multicast groups
>> when destroy_qp is called and return an error if so.
> 
> The core code already does some of this anyhow..
> 
>> diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
>> index 949784198d80..34e3c52f0b72 100644
>> +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
>> @@ -114,6 +114,7 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
>>  	grp->num_qp++;
>>  	elem->qp = qp;
>>  	elem->grp = grp;
>> +	atomic_inc(&qp->mcg_num);
> 
> eg what prevents qp from being concurrently destroyed here?
> 
> The core code because it doesn't allow a multicast group to be added
> concurrently with destruction of a QP.
> 
>> +int rxe_qp_chk_destroy(struct rxe_qp *qp)
>> +{
>> +	/* See IBA o10-2.2.3
>> +	 * An attempt to destroy a QP while attached to a mcast group
>> +	 * will fail immediately.
>> +	 */
>> +	if (atomic_read(&qp->mcg_num)) {
>> +		pr_warn_once("Attempt to destroy QP while attached to multicast group\n");
>> +		return -EBUSY;
> 
> Don't print
> 
> But yes, I think drivers are expected to do this, though most likely
> this is already happening for other reasons and this is mearly
> protective against bugs.
> 
> Jason

The real reason for this patch becomes apparent in the next one or two. With this no longer an issue half the complexity of rxe_mcast goes away. I'll get rid of the print.
Personally I find them helpful when debugging user code. Maybe a pr_debug?

Bob

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v9 04/26] RDMA/rxe: Enforce IBA o10-2.2.3
  2022-01-28 16:18     ` Bob Pearson
@ 2022-01-28 16:42       ` Jason Gunthorpe
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2022-01-28 16:42 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Fri, Jan 28, 2022 at 10:18:45AM -0600, Bob Pearson wrote:
> On 1/28/22 06:53, Jason Gunthorpe wrote:
> > On Thu, Jan 27, 2022 at 03:37:33PM -0600, Bob Pearson wrote:
> >> Add code to check if a QP is attached to one or more multicast groups
> >> when destroy_qp is called and return an error if so.
> > 
> > The core code already does some of this anyhow..
> > 
> >> diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
> >> index 949784198d80..34e3c52f0b72 100644
> >> +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
> >> @@ -114,6 +114,7 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
> >>  	grp->num_qp++;
> >>  	elem->qp = qp;
> >>  	elem->grp = grp;
> >> +	atomic_inc(&qp->mcg_num);
> > 
> > eg what prevents qp from being concurrently destroyed here?
> > 
> > The core code because it doesn't allow a multicast group to be added
> > concurrently with destruction of a QP.
> > 
> >> +int rxe_qp_chk_destroy(struct rxe_qp *qp)
> >> +{
> >> +	/* See IBA o10-2.2.3
> >> +	 * An attempt to destroy a QP while attached to a mcast group
> >> +	 * will fail immediately.
> >> +	 */
> >> +	if (atomic_read(&qp->mcg_num)) {
> >> +		pr_warn_once("Attempt to destroy QP while attached to multicast group\n");
> >> +		return -EBUSY;
> > 
> > Don't print
> > 
> > But yes, I think drivers are expected to do this, though most likely
> > this is already happening for other reasons and this is mearly
> > protective against bugs.
> > 
> > Jason
> 
> The real reason for this patch becomes apparent in the next one or two. With this no longer an issue half the complexity of rxe_mcast goes away. I'll get rid of the print.
> Personally I find them helpful when debugging user code. Maybe a
> pr_debug?

Sure

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v9 07/26] RDMA/rxe: Use kzmalloc/kfree for mca
  2022-01-27 21:37 ` [RFC PATCH v9 07/26] RDMA/rxe: Use kzmalloc/kfree for mca Bob Pearson
@ 2022-01-28 18:00   ` Jason Gunthorpe
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2022-01-28 18:00 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Thu, Jan 27, 2022 at 03:37:36PM -0600, Bob Pearson wrote:
> diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
> index 9336295c4ee2..39f38ee665f2 100644
> +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
> @@ -36,6 +36,7 @@ static struct rxe_mcg *create_grp(struct rxe_dev *rxe,
>  	grp = rxe_alloc_locked(&rxe->mc_grp_pool);
>  	if (!grp)
>  		return ERR_PTR(-ENOMEM);
> +	rxe_add_ref(grp);

I have no idea what this ref is for, the grp already has a ref of 1.

You should put the ref incrs near the place that makes a copy of the
pointer. Every pointer should have a ref.

When rxe_alloc_locked() returns the ref is 1 and this ref logically
belongs to the caller

When the caller does rxe_add_key_locked() then the ref is moved into
the rbtree and is now owned by the rbtree

When the caller does rxe_drop_key_locked() then the ref is moved out
of the rbtree and is now owned again by the caller.

After this patch this is leaking the memory on the error unwind:

	err = rxe_mcast_add(rxe, mgid);
	if (unlikely(err)) {
		rxe_drop_key_locked(grp);
		rxe_drop_ref(grp);
		return ERR_PTR(err);
	}

>  	INIT_LIST_HEAD(&grp->qp_list);
>  	spin_lock_init(&grp->mcg_lock);
> @@ -85,12 +86,28 @@ static int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
>  			   struct rxe_mcg *grp)
>  {
>  	int err;
> -	struct rxe_mca *elem;
> +	struct rxe_mca *mca, *new_mca;
>  
> -	/* check to see of the qp is already a member of the group */
> +	/* check to see if the qp is already a member of the group */
>  	spin_lock_bh(&grp->mcg_lock);
> -	list_for_each_entry(elem, &grp->qp_list, qp_list) {
> -		if (elem->qp == qp) {
> +	list_for_each_entry(mca, &grp->qp_list, qp_list) {
> +		if (mca->qp == qp) {
> +			spin_unlock_bh(&grp->mcg_lock);
> +			return 0;
> +		}
> +	}
> +	spin_unlock_bh(&grp->mcg_lock);

This would all be much simpler and faster to change it so the qp has
the list head that stores the list of groups it is joined to.

This code never seems to need to go from a group back to the list of
qps.

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v9 09/26] RDMA/rxe: Introduce RXECB(skb)
  2022-01-27 21:37 ` [RFC PATCH v9 09/26] RDMA/rxe: Introduce RXECB(skb) Bob Pearson
@ 2022-01-28 18:29   ` Jason Gunthorpe
  2022-01-30 17:47     ` Bob Pearson
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2022-01-28 18:29 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Thu, Jan 27, 2022 at 03:37:38PM -0600, Bob Pearson wrote:
> Add a #define RXECB(skb) to rxe_hdr.h as a short cut to
> refer to single members of rxe_pkt_info which is stored in skb->cb
> in the receive path. Use this to make some cleanups in rxe_recv.c
> 
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>  drivers/infiniband/sw/rxe/rxe_hdr.h  |  3 ++
>  drivers/infiniband/sw/rxe/rxe_recv.c | 55 +++++++++++++---------------
>  2 files changed, 29 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_hdr.h b/drivers/infiniband/sw/rxe/rxe_hdr.h
> index e432f9e37795..2a85d1e40e6a 100644
> +++ b/drivers/infiniband/sw/rxe/rxe_hdr.h
> @@ -36,6 +36,9 @@ static inline struct sk_buff *PKT_TO_SKB(struct rxe_pkt_info *pkt)
>  	return container_of((void *)pkt, struct sk_buff, cb);
>  }
>  
> +/* alternative to access a single element of rxe_pkt_info from skb */
> +#define RXECB(skb) ((struct rxe_pkt_info *)((skb)->cb))

May as well make this a static inline

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v9 12/26] RDMA/rxe: Replace pool key by rxe->mcg_tree
  2022-01-27 21:37 ` [RFC PATCH v9 12/26] RDMA/rxe: Replace pool key by rxe->mcg_tree Bob Pearson
@ 2022-01-28 18:32   ` Jason Gunthorpe
  2022-01-30 23:23     ` Bob Pearson
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2022-01-28 18:32 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Thu, Jan 27, 2022 at 03:37:41PM -0600, Bob Pearson wrote:
>  
>  struct rxe_mcg {
>  	struct rxe_pool_elem	elem;
> +	struct rb_node		node;
>  	struct rxe_dev		*rxe;
>  	struct list_head	qp_list;
>  	atomic_t		qp_num;
> @@ -397,6 +398,8 @@ struct rxe_dev {
>  	struct rxe_pool		mc_grp_pool;
>  
>  	spinlock_t		mcg_lock; /* guard multicast groups */

I think you should probably just use a mutex here and simplify
things..

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v9 18/26] RDMA/rxe: Convert mca read locking to RCU
  2022-01-27 21:37 ` [RFC PATCH v9 18/26] RDMA/rxe: Convert mca read locking to RCU Bob Pearson
@ 2022-01-28 18:39   ` Jason Gunthorpe
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2022-01-28 18:39 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Thu, Jan 27, 2022 at 03:37:47PM -0600, Bob Pearson wrote:
>  /**
> - * __rxe_init_mca - initialize a new mca holding lock
> + * __rxe_init_mca_rcu - initialize a new mca holding lock
>   * @qp: qp object
>   * @mcg: mcg object
>   * @mca: empty space for new mca
> @@ -280,7 +281,7 @@ void rxe_cleanup_mcg(struct kref *kref)
>   *
>   * Returns: 0 on success else an error
>   */
> -static int __rxe_init_mca(struct rxe_qp *qp, struct rxe_mcg *mcg,
> +static int __rxe_init_mca_rcu(struct rxe_qp *qp, struct rxe_mcg *mcg,
>  			  struct rxe_mca *mca)

There is nothing "rcu" about this function..

> @@ -324,14 +325,14 @@ static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
>  	int err;
>  
>  	/* check to see if the qp is already a member of the group */
> -	spin_lock_bh(&rxe->mcg_lock);
> -	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(mca, &mcg->qp_list, qp_list) {
>  		if (mca->qp == qp) {
> -			spin_unlock_bh(&rxe->mcg_lock);
> +			rcu_read_unlock();
>  			return 0;
>  		}
>  	}
> -	spin_unlock_bh(&rxe->mcg_lock);
> +	rcu_read_unlock();

Ok..

> @@ -340,16 +341,19 @@ static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
>  
>  	spin_lock_bh(&rxe->mcg_lock);
>  	/* re-check to see if someone else just attached qp */
> -	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
> +	rcu_read_lock();

Do not hold the RCU if you are holding the write side spinlock. All
mutations o fthe list must hold mcg_lock.

> +	list_for_each_entry_rcu(mca, &mcg->qp_list, qp_list) {
>  		if (mca->qp == qp) {
> +			rcu_read_unlock();
>  			kfree(new_mca);
>  			err = 0;
>  			goto done;
>  		}
>  	}
> +	rcu_read_unlock();
>  
>  	mca = new_mca;
> -	err = __rxe_init_mca(qp, mcg, mca);
> +	err = __rxe_init_mca_rcu(qp, mcg, mca);
>  	if (err)
>  		kfree(mca);

Which looks since the list_add is still inside the spinlock

>  done:
> @@ -359,21 +363,23 @@ static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
>  }
>  
>  /**
> - * __rxe_cleanup_mca - cleanup mca object holding lock
> + * __rxe_cleanup_mca_rcu - cleanup mca object holding lock
>   * @mca: mca object
>   * @mcg: mcg object
>   *
>   * Context: caller must hold a reference to mcg and rxe->mcg_lock
>   */
> -static void __rxe_cleanup_mca(struct rxe_mca *mca, struct rxe_mcg *mcg)
> +static void __rxe_cleanup_mca_rcu(struct rxe_mca *mca, struct rxe_mcg *mcg)

Also not rcu, list_del must hold the write side spinlock.

>  {
> -	list_del(&mca->qp_list);
> +	list_del_rcu(&mca->qp_list);
>  
>  	atomic_dec(&mcg->qp_num);
>  	atomic_dec(&mcg->rxe->mcg_attach);
>  	atomic_dec(&mca->qp->mcg_num);
>  
>  	rxe_drop_ref(mca->qp);
> +
> +	kfree_rcu(mca, rcu);

OK

>  }
>  
>  /**
> @@ -386,22 +392,29 @@ static void __rxe_cleanup_mca(struct rxe_mca *mca, struct rxe_mcg *mcg)
>  static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp)
>  {
>  	struct rxe_dev *rxe = mcg->rxe;
> -	struct rxe_mca *mca, *tmp;
> +	struct rxe_mca *mca;
> +	int ret;
>  
>  	spin_lock_bh(&rxe->mcg_lock);
> -	list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) {
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(mca, &mcg->qp_list, qp_list) {

As before, don't hold the rcu when holding the write side lock

>  		if (mca->qp == qp) {
> -			__rxe_cleanup_mca(mca, mcg);
> -			if (atomic_read(&mcg->qp_num) <= 0)
> -				kref_put(&mcg->ref_cnt, __rxe_cleanup_mcg);
> -			spin_unlock_bh(&rxe->mcg_lock);
> -			kfree(mca);
> -			return 0;
> +			rcu_read_unlock();
> +			goto found;
>  		}
>  	}
> +	rcu_read_unlock();
> +	ret = -EINVAL;
> +	goto done;
> +found:
> +	__rxe_cleanup_mca_rcu(mca, mcg);
> +	if (atomic_read(&mcg->qp_num) <= 0)
> +		kref_put(&mcg->ref_cnt, __rxe_cleanup_mcg);

This is confusing, why an atomic and a refcount with an atomic? Isn't
qpnum == 0 the same as list_empty(qp_list) ?

> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
> index 357a6cea1484..7f2ea61a52c1 100644
> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
> @@ -267,13 +267,13 @@ static void rxe_rcv_mcast_pkt(struct sk_buff *skb)
>  	qp_array = kmalloc_array(nmax, sizeof(qp), GFP_KERNEL);
>  
>  	n = 0;
> -	spin_lock_bh(&rxe->mcg_lock);
> -	list_for_each_entry(mca, &mcg->qp_list, qp_list) {
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(mca, &mcg->qp_list, qp_list) {
>  		qp_array[n++] = mca->qp;
>  		if (n == nmax)
>  			break;
>  	}
> -	spin_unlock_bh(&rxe->mcg_lock);
> +	rcu_read_unlock();
>  	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);

I have no idea how this works, what keeps 'qp' valid and prevents it
from being free'd once we leave the locking? Remember the mca can be
in concurrent progress to free so qp is just garbage under RCU at this
point.

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v9 00/26]
  2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
                   ` (25 preceding siblings ...)
  2022-01-27 21:37 ` [RFC PATCH v9 26/26] RDMA/rxe: Replace mr by rkey in responder resources Bob Pearson
@ 2022-01-28 18:42 ` Jason Gunthorpe
  2022-02-07 19:20   ` Bob Pearson
  26 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2022-01-28 18:42 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Thu, Jan 27, 2022 at 03:37:29PM -0600, Bob Pearson wrote:
> There are several race conditions discovered in the current rdma_rxe
> 
> Bob Pearson (26):
>   RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c
>   RDMA/rxe: Move rxe_mcast_attach/detach to rxe_mcast.c
>   RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem
>   RDMA/rxe: Enforce IBA o10-2.2.3
>   RDMA/rxe: Remove rxe_drop_all_macst_groups
>   RDMA/rxe: Remove qp->grp_lock and qp->grp_list

I took these patches to for-next

>   RDMA/rxe: Use kzmalloc/kfree for mca
>   RDMA/rxe: Rename grp to mcg and mce to mca
>   RDMA/rxe: Introduce RXECB(skb)
>   RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases
>   RDMA/rxe: Replace locks by rxe->mcg_lock
>   RDMA/rxe: Replace pool key by rxe->mcg_tree
>   RDMA/rxe: Remove key'ed object support
>   RDMA/rxe: Remove mcg from rxe pools
>   RDMA/rxe: Add code to cleanup mcast memory
>   RDMA/rxe: Add comments to rxe_mcast.c
>   RDMA/rxe: Separate code into subroutines

I think you should try to get up to here done in one series and
merged, it looked OK

>   RDMA/rxe: Convert mca read locking to RCU

Not sure this can ever work..

>   RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
>   RDMA/rxe: Delete _locked() APIs for pool objects
>   RDMA/rxe: Replace obj by elem in declaration
>   RDMA/rxe: Replace red-black trees by xarrays
>   RDMA/rxe: Change pool locking to RCU
>   RDMA/rxe: Add wait_for_completion to pool objects
>   RDMA/rxe: Fix ref error in rxe_av.c
>   RDMA/rxe: Replace mr by rkey in responder resources

These also seem reasonable, didn't follow why we needed the RCU patch?

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v9 09/26] RDMA/rxe: Introduce RXECB(skb)
  2022-01-28 18:29   ` Jason Gunthorpe
@ 2022-01-30 17:47     ` Bob Pearson
  0 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-30 17:47 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 1/28/22 12:29, Jason Gunthorpe wrote:
> On Thu, Jan 27, 2022 at 03:37:38PM -0600, Bob Pearson wrote:
>> Add a #define RXECB(skb) to rxe_hdr.h as a short cut to
>> refer to single members of rxe_pkt_info which is stored in skb->cb
>> in the receive path. Use this to make some cleanups in rxe_recv.c
>>
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>  drivers/infiniband/sw/rxe/rxe_hdr.h  |  3 ++
>>  drivers/infiniband/sw/rxe/rxe_recv.c | 55 +++++++++++++---------------
>>  2 files changed, 29 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_hdr.h b/drivers/infiniband/sw/rxe/rxe_hdr.h
>> index e432f9e37795..2a85d1e40e6a 100644
>> +++ b/drivers/infiniband/sw/rxe/rxe_hdr.h
>> @@ -36,6 +36,9 @@ static inline struct sk_buff *PKT_TO_SKB(struct rxe_pkt_info *pkt)
>>  	return container_of((void *)pkt, struct sk_buff, cb);
>>  }
>>  
>> +/* alternative to access a single element of rxe_pkt_info from skb */
>> +#define RXECB(skb) ((struct rxe_pkt_info *)((skb)->cb))
> 
> May as well make this a static inline
> 
> Jason

Mostly these seem to be #defines. See e.g. IPCB, IP6CB, NAPI_GRO_CB, etc. Not sure why but
I copied those.

Bob

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v9 12/26] RDMA/rxe: Replace pool key by rxe->mcg_tree
  2022-01-28 18:32   ` Jason Gunthorpe
@ 2022-01-30 23:23     ` Bob Pearson
  0 siblings, 0 replies; 41+ messages in thread
From: Bob Pearson @ 2022-01-30 23:23 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 1/28/22 12:32, Jason Gunthorpe wrote:
> On Thu, Jan 27, 2022 at 03:37:41PM -0600, Bob Pearson wrote:
>>  
>>  struct rxe_mcg {
>>  	struct rxe_pool_elem	elem;
>> +	struct rb_node		node;
>>  	struct rxe_dev		*rxe;
>>  	struct list_head	qp_list;
>>  	atomic_t		qp_num;
>> @@ -397,6 +398,8 @@ struct rxe_dev {
>>  	struct rxe_pool		mc_grp_pool;
>>  
>>  	spinlock_t		mcg_lock; /* guard multicast groups */
> 
> I think you should probably just use a mutex here and simplify
> things..
> 
> Jason

That won't work. All lthe code in rxe_mcast.c would be OK but I have to take the
lock in rxe_recv.c when we receive a multicast packet and mutexes are not allowed in
bottom halves.

Bob

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v9 00/26]
  2022-01-28 18:42 ` [RFC PATCH v9 00/26] Jason Gunthorpe
@ 2022-02-07 19:20   ` Bob Pearson
  2022-02-07 19:38     ` Jason Gunthorpe
  0 siblings, 1 reply; 41+ messages in thread
From: Bob Pearson @ 2022-02-07 19:20 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 1/28/22 12:42, Jason Gunthorpe wrote:
> On Thu, Jan 27, 2022 at 03:37:29PM -0600, Bob Pearson wrote:
>> There are several race conditions discovered in the current rdma_rxe
>>
>> Bob Pearson (26):
>>   RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c
>>   RDMA/rxe: Move rxe_mcast_attach/detach to rxe_mcast.c
>>   RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem
>>   RDMA/rxe: Enforce IBA o10-2.2.3
>>   RDMA/rxe: Remove rxe_drop_all_macst_groups
>>   RDMA/rxe: Remove qp->grp_lock and qp->grp_list
> 
> I took these patches to for-next
> 
>>   RDMA/rxe: Use kzmalloc/kfree for mca
>>   RDMA/rxe: Rename grp to mcg and mce to mca
>>   RDMA/rxe: Introduce RXECB(skb)
>>   RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases
>>   RDMA/rxe: Replace locks by rxe->mcg_lock
>>   RDMA/rxe: Replace pool key by rxe->mcg_tree
>>   RDMA/rxe: Remove key'ed object support
>>   RDMA/rxe: Remove mcg from rxe pools
>>   RDMA/rxe: Add code to cleanup mcast memory
>>   RDMA/rxe: Add comments to rxe_mcast.c
>>   RDMA/rxe: Separate code into subroutines
> 
> I think you should try to get up to here done in one series and
> merged, it looked OK

Jason,

I have these ready again. It is a little restructured but gets to the same place.
Last time I sent things in you had a complaint but it got mangled somehow so I
couldn't read it. Is there anything else I should be looking at before posting these
again?

Bob
> 
>>   RDMA/rxe: Convert mca read locking to RCU
> 
> Not sure this can ever work..
> 
>>   RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
>>   RDMA/rxe: Delete _locked() APIs for pool objects
>>   RDMA/rxe: Replace obj by elem in declaration
>>   RDMA/rxe: Replace red-black trees by xarrays
>>   RDMA/rxe: Change pool locking to RCU
>>   RDMA/rxe: Add wait_for_completion to pool objects
>>   RDMA/rxe: Fix ref error in rxe_av.c
>>   RDMA/rxe: Replace mr by rkey in responder resources
> 
> These also seem reasonable, didn't follow why we needed the RCU patch?
> 
> Jason


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v9 00/26]
  2022-02-07 19:20   ` Bob Pearson
@ 2022-02-07 19:38     ` Jason Gunthorpe
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2022-02-07 19:38 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Mon, Feb 07, 2022 at 01:20:32PM -0600, Bob Pearson wrote:
> On 1/28/22 12:42, Jason Gunthorpe wrote:
> > On Thu, Jan 27, 2022 at 03:37:29PM -0600, Bob Pearson wrote:
> >> There are several race conditions discovered in the current rdma_rxe
> >>
> >> Bob Pearson (26):
> >>   RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c
> >>   RDMA/rxe: Move rxe_mcast_attach/detach to rxe_mcast.c
> >>   RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem
> >>   RDMA/rxe: Enforce IBA o10-2.2.3
> >>   RDMA/rxe: Remove rxe_drop_all_macst_groups
> >>   RDMA/rxe: Remove qp->grp_lock and qp->grp_list
> > 
> > I took these patches to for-next
> > 
> >>   RDMA/rxe: Use kzmalloc/kfree for mca
> >>   RDMA/rxe: Rename grp to mcg and mce to mca
> >>   RDMA/rxe: Introduce RXECB(skb)
> >>   RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases
> >>   RDMA/rxe: Replace locks by rxe->mcg_lock
> >>   RDMA/rxe: Replace pool key by rxe->mcg_tree
> >>   RDMA/rxe: Remove key'ed object support
> >>   RDMA/rxe: Remove mcg from rxe pools
> >>   RDMA/rxe: Add code to cleanup mcast memory
> >>   RDMA/rxe: Add comments to rxe_mcast.c
> >>   RDMA/rxe: Separate code into subroutines
> > 
> > I think you should try to get up to here done in one series and
> > merged, it looked OK
> 
> Jason,
> 
> I have these ready again. It is a little restructured but gets to the same place.
> Last time I sent things in you had a complaint but it got mangled somehow so I
> couldn't read it. Is there anything else I should be looking at before posting these
> again?

I think I said you shouldn't re-send patches I've already applied?

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2022-02-07 19:41 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-27 21:37 [RFC PATCH v9 00/26] Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 01/26] RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 02/26] RDMA/rxe: Move rxe_mcast_attach/detach " Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 03/26] RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 04/26] RDMA/rxe: Enforce IBA o10-2.2.3 Bob Pearson
2022-01-28 12:53   ` Jason Gunthorpe
2022-01-28 16:18     ` Bob Pearson
2022-01-28 16:42       ` Jason Gunthorpe
2022-01-27 21:37 ` [RFC PATCH v9 05/26] RDMA/rxe: Remove rxe_drop_all_macst_groups Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 06/26] RDMA/rxe: Remove qp->grp_lock and qp->grp_list Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 07/26] RDMA/rxe: Use kzmalloc/kfree for mca Bob Pearson
2022-01-28 18:00   ` Jason Gunthorpe
2022-01-27 21:37 ` [RFC PATCH v9 08/26] RDMA/rxe: Rename grp to mcg and mce to mca Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 09/26] RDMA/rxe: Introduce RXECB(skb) Bob Pearson
2022-01-28 18:29   ` Jason Gunthorpe
2022-01-30 17:47     ` Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 10/26] RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 11/26] RDMA/rxe: Replace locks by rxe->mcg_lock Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 12/26] RDMA/rxe: Replace pool key by rxe->mcg_tree Bob Pearson
2022-01-28 18:32   ` Jason Gunthorpe
2022-01-30 23:23     ` Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 13/26] RDMA/rxe: Remove key'ed object support Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 14/26] RDMA/rxe: Remove mcg from rxe pools Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 15/26] RDMA/rxe: Add code to cleanup mcast memory Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 16/26] RDMA/rxe: Add comments to rxe_mcast.c Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 17/26] RDMA/rxe: Separate code into subroutines Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 18/26] RDMA/rxe: Convert mca read locking to RCU Bob Pearson
2022-01-28 18:39   ` Jason Gunthorpe
2022-01-27 21:37 ` [RFC PATCH v9 19/26] RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 20/26] RDMA/rxe: Delete _locked() APIs for pool objects Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 21/26] RDMA/rxe: Replace obj by elem in declaration Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 22/26] RDMA/rxe: Replace red-black trees by xarrays Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 23/26] RDMA/rxe: Change pool locking to RCU Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 24/26] RDMA/rxe: Add wait_for_completion to pool objects Bob Pearson
2022-01-28  3:58   ` kernel test robot
2022-01-28  3:58     ` kernel test robot
2022-01-27 21:37 ` [RFC PATCH v9 25/26] RDMA/rxe: Fix ref error in rxe_av.c Bob Pearson
2022-01-27 21:37 ` [RFC PATCH v9 26/26] RDMA/rxe: Replace mr by rkey in responder resources Bob Pearson
2022-01-28 18:42 ` [RFC PATCH v9 00/26] Jason Gunthorpe
2022-02-07 19:20   ` Bob Pearson
2022-02-07 19:38     ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.