All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next v11 00/13] Fix race conditions in rxe_pool
@ 2022-03-04  0:07 Bob Pearson
  2022-03-04  0:07 ` [PATCH for-next v11 01/13] RDMA/rxe: Fix ref error in rxe_av.c Bob Pearson
                   ` (13 more replies)
  0 siblings, 14 replies; 28+ messages in thread
From: Bob Pearson @ 2022-03-04  0:07 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

There are several race conditions discovered in the current rdma_rxe
driver.  They mostly relate to races between normal operations and
destroying objects.  This patch series
 - Makes several minor cleanups in rxe_pool.[ch]
 - Replaces the red-black trees currently used by xarrays for indices
 - Corrects several reference counting errors
 - Adds wait for completions to the paths in verbs APIs which destroy
   objects.
 - Changes read side locking to rcu.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
v11
  Rebased to current for-next.
  Reordered patches and made other changes to respond to issues
  reported by Jason Gunthorpe.
v10
  Rebased to current wip/jgg-for-next.
  Split some patches into smaller ones.
v9
  Corrected issues reported by Jason Gunthorpe,
  Converted locking in rxe_mcast.c and rxe_pool.c to use RCU
  Split up the patches into smaller changes
v8
  Fixed an additional race in 3/8 which was not handled correctly.
v7
  Corrected issues reported by Jason Gunthorpe
Link: https://lore.kernel.org/linux-rdma/20211207190947.GH6385@nvidia.com/
Link: https://lore.kernel.org/linux-rdma/20211207191857.GI6385@nvidia.com/
Link: https://lore.kernel.org/linux-rdma/20211207192824.GJ6385@nvidia.com/
v6
  Fixed a kzalloc flags bug.
  Fixed comment bug reported by 'Kernel Test Robot'.
  Changed type of rxe_pool.c in __rxe_fini().
v5
  Removed patches already accepted into for-next and addressed comments
  from Jason Gunthorpe.
v4
  Restructured patch series to change to xarray earlier which
  greatly simplified the changes.
  Rebased to current for-next
v3
  Changed rxe_alloc to use GFP_KERNEL
  Addressed other comments by Jason Gunthorp
  Merged the previous 06/10 and 07/10 patches into one since they overlapped
  Added some minor cleanups as 10/10
v2
  Rebased to current for-next.
  Added 4 additional patches

Bob Pearson (13):
  RDMA/rxe: Fix ref error in rxe_av.c
  RDMA/rxe: Replace mr by rkey in responder resources
  RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
  RDMA/rxe: Delete _locked() APIs for pool objects
  RDMA/rxe: Replace obj by elem in declaration
  RDMA/rxe: Move max_elem into rxe_type_info
  RDMA/rxe: Shorten pool names in rxe_pool.c
  RDMA/rxe: Replace red-black trees by xarrays
  RDMA/rxe: Use standard names for ref counting
  RDMA/rxe: Stop lookup of partially built objects
  RDMA/rxe: Add wait_for_completion to pool objects
  RDMA/rxe: Convert read side locking to rcu
  RDMA/rxe: Cleanup rxe_pool.c

 drivers/infiniband/sw/rxe/rxe.c       |  88 +----
 drivers/infiniband/sw/rxe/rxe_av.c    |  19 +-
 drivers/infiniband/sw/rxe/rxe_comp.c  |   8 +-
 drivers/infiniband/sw/rxe/rxe_loc.h   |   5 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c |   4 +-
 drivers/infiniband/sw/rxe/rxe_mr.c    |  17 +-
 drivers/infiniband/sw/rxe/rxe_mw.c    |  42 ++-
 drivers/infiniband/sw/rxe/rxe_net.c   |  23 +-
 drivers/infiniband/sw/rxe/rxe_pool.c  | 461 ++++++++++++++------------
 drivers/infiniband/sw/rxe/rxe_pool.h  |  75 ++---
 drivers/infiniband/sw/rxe/rxe_qp.c    |  38 +--
 drivers/infiniband/sw/rxe/rxe_recv.c  |   8 +-
 drivers/infiniband/sw/rxe/rxe_req.c   |  61 ++--
 drivers/infiniband/sw/rxe/rxe_resp.c  | 149 ++++++---
 drivers/infiniband/sw/rxe/rxe_verbs.c |  89 +++--
 drivers/infiniband/sw/rxe/rxe_verbs.h |   1 -
 16 files changed, 540 insertions(+), 548 deletions(-)


base-commit: a80501b89152adb29adc7ab943d75c7345f9a3fb
-- 
2.32.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH for-next v11 01/13] RDMA/rxe: Fix ref error in rxe_av.c
  2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
@ 2022-03-04  0:07 ` Bob Pearson
  2022-03-04  0:07 ` [PATCH for-next v11 02/13] RDMA/rxe: Replace mr by rkey in responder resources Bob Pearson
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Bob Pearson @ 2022-03-04  0:07 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

The commit referenced below can take a reference to the AH which is
never dropped. This only happens in the UD request path. This patch
optionally passes that AH back to the caller so that it can hold the
reference while the AV is being accessed and then drop it. Code to
do this is added to rxe_req.c. The AV is also passed to rxe_prepare
in rxe_net.c as an optimization.

Fixes: e2fe06c90806 ("RDMA/rxe: Lookup kernel AH from ah index in UD WQEs")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_av.c   | 19 +++++++++-
 drivers/infiniband/sw/rxe/rxe_loc.h  |  5 ++-
 drivers/infiniband/sw/rxe/rxe_net.c  | 17 +++++----
 drivers/infiniband/sw/rxe/rxe_req.c  | 55 +++++++++++++++++-----------
 drivers/infiniband/sw/rxe/rxe_resp.c |  2 +-
 5 files changed, 63 insertions(+), 35 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c
index 38c7b6fb39d7..360a567159fe 100644
--- a/drivers/infiniband/sw/rxe/rxe_av.c
+++ b/drivers/infiniband/sw/rxe/rxe_av.c
@@ -99,11 +99,14 @@ void rxe_av_fill_ip_info(struct rxe_av *av, struct rdma_ah_attr *attr)
 	av->network_type = type;
 }
 
-struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt)
+struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt, struct rxe_ah **ahp)
 {
 	struct rxe_ah *ah;
 	u32 ah_num;
 
+	if (ahp)
+		*ahp = NULL;
+
 	if (!pkt || !pkt->qp)
 		return NULL;
 
@@ -117,10 +120,22 @@ struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt)
 	if (ah_num) {
 		/* only new user provider or kernel client */
 		ah = rxe_pool_get_index(&pkt->rxe->ah_pool, ah_num);
-		if (!ah || ah->ah_num != ah_num || rxe_ah_pd(ah) != pkt->qp->pd) {
+		if (!ah) {
 			pr_warn("Unable to find AH matching ah_num\n");
 			return NULL;
 		}
+
+		if (rxe_ah_pd(ah) != pkt->qp->pd) {
+			pr_warn("PDs don't match for AH and QP\n");
+			rxe_drop_ref(ah);
+			return NULL;
+		}
+
+		if (ahp)
+			*ahp = ah;
+		else
+			rxe_drop_ref(ah);
+
 		return &ah->av;
 	}
 
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 409efeecd581..2ffbe3390668 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -19,7 +19,7 @@ void rxe_av_to_attr(struct rxe_av *av, struct rdma_ah_attr *attr);
 
 void rxe_av_fill_ip_info(struct rxe_av *av, struct rdma_ah_attr *attr);
 
-struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt);
+struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt, struct rxe_ah **ahp);
 
 /* rxe_cq.c */
 int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq,
@@ -94,7 +94,8 @@ void rxe_mw_cleanup(struct rxe_pool_elem *arg);
 /* rxe_net.c */
 struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
 				int paylen, struct rxe_pkt_info *pkt);
-int rxe_prepare(struct rxe_pkt_info *pkt, struct sk_buff *skb);
+int rxe_prepare(struct rxe_av *av, struct rxe_pkt_info *pkt,
+		struct sk_buff *skb);
 int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 		    struct sk_buff *skb);
 const char *rxe_parent_name(struct rxe_dev *rxe, unsigned int port_num);
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index a8cfa7160478..b06f22ffc5a8 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -271,13 +271,13 @@ static void prepare_ipv6_hdr(struct dst_entry *dst, struct sk_buff *skb,
 	ip6h->payload_len = htons(skb->len - sizeof(*ip6h));
 }
 
-static int prepare4(struct rxe_pkt_info *pkt, struct sk_buff *skb)
+static int prepare4(struct rxe_av *av, struct rxe_pkt_info *pkt,
+		    struct sk_buff *skb)
 {
 	struct rxe_qp *qp = pkt->qp;
 	struct dst_entry *dst;
 	bool xnet = false;
 	__be16 df = htons(IP_DF);
-	struct rxe_av *av = rxe_get_av(pkt);
 	struct in_addr *saddr = &av->sgid_addr._sockaddr_in.sin_addr;
 	struct in_addr *daddr = &av->dgid_addr._sockaddr_in.sin_addr;
 
@@ -297,11 +297,11 @@ static int prepare4(struct rxe_pkt_info *pkt, struct sk_buff *skb)
 	return 0;
 }
 
-static int prepare6(struct rxe_pkt_info *pkt, struct sk_buff *skb)
+static int prepare6(struct rxe_av *av, struct rxe_pkt_info *pkt,
+		    struct sk_buff *skb)
 {
 	struct rxe_qp *qp = pkt->qp;
 	struct dst_entry *dst;
-	struct rxe_av *av = rxe_get_av(pkt);
 	struct in6_addr *saddr = &av->sgid_addr._sockaddr_in6.sin6_addr;
 	struct in6_addr *daddr = &av->dgid_addr._sockaddr_in6.sin6_addr;
 
@@ -322,16 +322,17 @@ static int prepare6(struct rxe_pkt_info *pkt, struct sk_buff *skb)
 	return 0;
 }
 
-int rxe_prepare(struct rxe_pkt_info *pkt, struct sk_buff *skb)
+int rxe_prepare(struct rxe_av *av, struct rxe_pkt_info *pkt,
+		struct sk_buff *skb)
 {
 	int err = 0;
 
 	if (skb->protocol == htons(ETH_P_IP))
-		err = prepare4(pkt, skb);
+		err = prepare4(av, pkt, skb);
 	else if (skb->protocol == htons(ETH_P_IPV6))
-		err = prepare6(pkt, skb);
+		err = prepare6(av, pkt, skb);
 
-	if (ether_addr_equal(skb->dev->dev_addr, rxe_get_av(pkt)->dmac))
+	if (ether_addr_equal(skb->dev->dev_addr, av->dmac))
 		pkt->mask |= RXE_LOOPBACK_MASK;
 
 	return err;
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 5eb89052dd66..f44535f82bea 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -358,6 +358,7 @@ static inline int get_mtu(struct rxe_qp *qp)
 }
 
 static struct sk_buff *init_req_packet(struct rxe_qp *qp,
+				       struct rxe_av *av,
 				       struct rxe_send_wqe *wqe,
 				       int opcode, int payload,
 				       struct rxe_pkt_info *pkt)
@@ -365,7 +366,6 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
 	struct rxe_dev		*rxe = to_rdev(qp->ibqp.device);
 	struct sk_buff		*skb;
 	struct rxe_send_wr	*ibwr = &wqe->wr;
-	struct rxe_av		*av;
 	int			pad = (-payload) & 0x3;
 	int			paylen;
 	int			solicited;
@@ -374,21 +374,9 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
 
 	/* length from start of bth to end of icrc */
 	paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE;
-
-	/* pkt->hdr, port_num and mask are initialized in ifc layer */
-	pkt->rxe	= rxe;
-	pkt->opcode	= opcode;
-	pkt->qp		= qp;
-	pkt->psn	= qp->req.psn;
-	pkt->mask	= rxe_opcode[opcode].mask;
-	pkt->paylen	= paylen;
-	pkt->wqe	= wqe;
+	pkt->paylen = paylen;
 
 	/* init skb */
-	av = rxe_get_av(pkt);
-	if (!av)
-		return NULL;
-
 	skb = rxe_init_packet(rxe, av, paylen, pkt);
 	if (unlikely(!skb))
 		return NULL;
@@ -447,13 +435,13 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
 	return skb;
 }
 
-static int finish_packet(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
-		       struct rxe_pkt_info *pkt, struct sk_buff *skb,
-		       int paylen)
+static int finish_packet(struct rxe_qp *qp, struct rxe_av *av,
+			 struct rxe_send_wqe *wqe, struct rxe_pkt_info *pkt,
+			 struct sk_buff *skb, int paylen)
 {
 	int err;
 
-	err = rxe_prepare(pkt, skb);
+	err = rxe_prepare(av, pkt, skb);
 	if (err)
 		return err;
 
@@ -608,6 +596,7 @@ static int rxe_do_local_ops(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 int rxe_requester(void *arg)
 {
 	struct rxe_qp *qp = (struct rxe_qp *)arg;
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
 	struct rxe_pkt_info pkt;
 	struct sk_buff *skb;
 	struct rxe_send_wqe *wqe;
@@ -619,6 +608,8 @@ int rxe_requester(void *arg)
 	struct rxe_send_wqe rollback_wqe;
 	u32 rollback_psn;
 	struct rxe_queue *q = qp->sq.queue;
+	struct rxe_ah *ah;
+	struct rxe_av *av;
 
 	rxe_add_ref(qp);
 
@@ -705,14 +696,28 @@ int rxe_requester(void *arg)
 		payload = mtu;
 	}
 
-	skb = init_req_packet(qp, wqe, opcode, payload, &pkt);
+	pkt.rxe = rxe;
+	pkt.opcode = opcode;
+	pkt.qp = qp;
+	pkt.psn = qp->req.psn;
+	pkt.mask = rxe_opcode[opcode].mask;
+	pkt.wqe = wqe;
+
+	av = rxe_get_av(&pkt, &ah);
+	if (unlikely(!av)) {
+		pr_err("qp#%d Failed no address vector\n", qp_num(qp));
+		wqe->status = IB_WC_LOC_QP_OP_ERR;
+		goto err_drop_ah;
+	}
+
+	skb = init_req_packet(qp, av, wqe, opcode, payload, &pkt);
 	if (unlikely(!skb)) {
 		pr_err("qp#%d Failed allocating skb\n", qp_num(qp));
 		wqe->status = IB_WC_LOC_QP_OP_ERR;
-		goto err;
+		goto err_drop_ah;
 	}
 
-	ret = finish_packet(qp, wqe, &pkt, skb, payload);
+	ret = finish_packet(qp, av, wqe, &pkt, skb, payload);
 	if (unlikely(ret)) {
 		pr_debug("qp#%d Error during finish packet\n", qp_num(qp));
 		if (ret == -EFAULT)
@@ -720,9 +725,12 @@ int rxe_requester(void *arg)
 		else
 			wqe->status = IB_WC_LOC_QP_OP_ERR;
 		kfree_skb(skb);
-		goto err;
+		goto err_drop_ah;
 	}
 
+	if (ah)
+		rxe_drop_ref(ah);
+
 	/*
 	 * To prevent a race on wqe access between requester and completer,
 	 * wqe members state and psn need to be set before calling
@@ -751,6 +759,9 @@ int rxe_requester(void *arg)
 
 	goto next_wqe;
 
+err_drop_ah:
+	if (ah)
+		rxe_drop_ref(ah);
 err:
 	wqe->state = wqe_state_error;
 	__rxe_do_task(&qp->comp.task);
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index c369d78fc8e8..b5ebe853748a 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -633,7 +633,7 @@ static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp,
 	if (ack->mask & RXE_ATMACK_MASK)
 		atmack_set_orig(ack, qp->resp.atomic_orig);
 
-	err = rxe_prepare(ack, skb);
+	err = rxe_prepare(&qp->pri_av, ack, skb);
 	if (err) {
 		kfree_skb(skb);
 		return NULL;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH for-next v11 02/13] RDMA/rxe: Replace mr by rkey in responder resources
  2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
  2022-03-04  0:07 ` [PATCH for-next v11 01/13] RDMA/rxe: Fix ref error in rxe_av.c Bob Pearson
@ 2022-03-04  0:07 ` Bob Pearson
  2022-03-04  0:07 ` [PATCH for-next v11 03/13] RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC Bob Pearson
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Bob Pearson @ 2022-03-04  0:07 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently rxe saves a copy of MR in responder resources for RDMA reads.
Since the responder resources are never freed just over written if
more are needed this MR may not have a reference freed until the QP
is destroyed. This patch uses the rkey instead of the MR and on
subsequent packets of a multipacket read reply message it looks up the
MR from the rkey for each packet. This makes it possible for a user
to deregister an MR or unbind a MW on the fly and get correct behaviour.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c    |  10 +--
 drivers/infiniband/sw/rxe/rxe_resp.c  | 123 ++++++++++++++++++--------
 drivers/infiniband/sw/rxe/rxe_verbs.h |   1 -
 3 files changed, 87 insertions(+), 47 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 5f270cbf18c6..26d461a8d71c 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -135,12 +135,8 @@ static void free_rd_atomic_resources(struct rxe_qp *qp)
 
 void free_rd_atomic_resource(struct rxe_qp *qp, struct resp_res *res)
 {
-	if (res->type == RXE_ATOMIC_MASK) {
+	if (res->type == RXE_ATOMIC_MASK)
 		kfree_skb(res->atomic.skb);
-	} else if (res->type == RXE_READ_MASK) {
-		if (res->read.mr)
-			rxe_drop_ref(res->read.mr);
-	}
 	res->type = 0;
 }
 
@@ -825,10 +821,8 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
 	if (qp->pd)
 		rxe_drop_ref(qp->pd);
 
-	if (qp->resp.mr) {
+	if (qp->resp.mr)
 		rxe_drop_ref(qp->resp.mr);
-		qp->resp.mr = NULL;
-	}
 
 	if (qp_type(qp) == IB_QPT_RC)
 		sk_dst_reset(qp->sk->sk);
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index b5ebe853748a..b1ec003f0bb8 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -642,6 +642,78 @@ static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp,
 	return skb;
 }
 
+static struct resp_res *rxe_prepare_read_res(struct rxe_qp *qp,
+					struct rxe_pkt_info *pkt)
+{
+	struct resp_res *res;
+	u32 pkts;
+
+	res = &qp->resp.resources[qp->resp.res_head];
+	rxe_advance_resp_resource(qp);
+	free_rd_atomic_resource(qp, res);
+
+	res->type = RXE_READ_MASK;
+	res->replay = 0;
+	res->read.va = qp->resp.va + qp->resp.offset;
+	res->read.va_org = qp->resp.va + qp->resp.offset;
+	res->read.resid = qp->resp.resid;
+	res->read.length = qp->resp.resid;
+	res->read.rkey = qp->resp.rkey;
+
+	pkts = max_t(u32, (reth_len(pkt) + qp->mtu - 1)/qp->mtu, 1);
+	res->first_psn = pkt->psn;
+	res->cur_psn = pkt->psn;
+	res->last_psn = (pkt->psn + pkts - 1) & BTH_PSN_MASK;
+
+	res->state = rdatm_res_state_new;
+
+	return res;
+}
+
+/**
+ * rxe_recheck_mr - revalidate MR from rkey and get a reference
+ * @qp: the qp
+ * @rkey: the rkey
+ *
+ * This code allows the MR to be invalidated or deregistered or
+ * the MW if one was used to be invalidated or deallocated.
+ * It is assumed that the access permissions if originally good
+ * are OK and the mappings to be unchanged.
+ *
+ * Return: mr on success else NULL
+ */
+static struct rxe_mr *rxe_recheck_mr(struct rxe_qp *qp, u32 rkey)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	struct rxe_mr *mr;
+	struct rxe_mw *mw;
+
+	if (rkey_is_mw(rkey)) {
+		mw = rxe_pool_get_index(&rxe->mw_pool, rkey >> 8);
+		if (!mw || mw->rkey != rkey)
+			return NULL;
+
+		if (mw->state != RXE_MW_STATE_VALID) {
+			rxe_drop_ref(mw);
+			return NULL;
+		}
+
+		mr = mw->mr;
+		rxe_drop_ref(mw);
+	} else {
+		mr = rxe_pool_get_index(&rxe->mr_pool, rkey >> 8);
+		if (!mr || mr->rkey != rkey)
+			return NULL;
+	}
+
+	if (mr->state != RXE_MR_STATE_VALID) {
+		rxe_drop_ref(mr);
+		return NULL;
+	}
+
+	return mr;
+}
+
 /* RDMA read response. If res is not NULL, then we have a current RDMA request
  * being processed or replayed.
  */
@@ -656,53 +728,26 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 	int opcode;
 	int err;
 	struct resp_res *res = qp->resp.res;
+	struct rxe_mr *mr;
 
 	if (!res) {
-		/* This is the first time we process that request. Get a
-		 * resource
-		 */
-		res = &qp->resp.resources[qp->resp.res_head];
-
-		free_rd_atomic_resource(qp, res);
-		rxe_advance_resp_resource(qp);
-
-		res->type		= RXE_READ_MASK;
-		res->replay		= 0;
-
-		res->read.va		= qp->resp.va +
-					  qp->resp.offset;
-		res->read.va_org	= qp->resp.va +
-					  qp->resp.offset;
-
-		res->first_psn		= req_pkt->psn;
-
-		if (reth_len(req_pkt)) {
-			res->last_psn	= (req_pkt->psn +
-					   (reth_len(req_pkt) + mtu - 1) /
-					   mtu - 1) & BTH_PSN_MASK;
-		} else {
-			res->last_psn	= res->first_psn;
-		}
-		res->cur_psn		= req_pkt->psn;
-
-		res->read.resid		= qp->resp.resid;
-		res->read.length	= qp->resp.resid;
-		res->read.rkey		= qp->resp.rkey;
-
-		/* note res inherits the reference to mr from qp */
-		res->read.mr		= qp->resp.mr;
-		qp->resp.mr		= NULL;
-
-		qp->resp.res		= res;
-		res->state		= rdatm_res_state_new;
+		res = rxe_prepare_read_res(qp, req_pkt);
+		qp->resp.res = res;
 	}
 
 	if (res->state == rdatm_res_state_new) {
+		mr = qp->resp.mr;
+		qp->resp.mr = NULL;
+
 		if (res->read.resid <= mtu)
 			opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY;
 		else
 			opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST;
 	} else {
+		mr = rxe_recheck_mr(qp, res->read.rkey);
+		if (!mr)
+			return RESPST_ERR_RKEY_VIOLATION;
+
 		if (res->read.resid > mtu)
 			opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE;
 		else
@@ -718,10 +763,12 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 	if (!skb)
 		return RESPST_ERR_RNR;
 
-	err = rxe_mr_copy(res->read.mr, res->read.va, payload_addr(&ack_pkt),
+	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
 			  payload, RXE_FROM_MR_OBJ);
 	if (err)
 		pr_err("Failed copying memory\n");
+	if (mr)
+		rxe_drop_ref(mr);
 
 	if (bth_pad(&ack_pkt)) {
 		u8 *pad = payload_addr(&ack_pkt) + payload;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 6b15251ff67a..e7eff1ca75e9 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -157,7 +157,6 @@ struct resp_res {
 			struct sk_buff	*skb;
 		} atomic;
 		struct {
-			struct rxe_mr	*mr;
 			u64		va_org;
 			u32		rkey;
 			u32		length;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH for-next v11 03/13] RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
  2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
  2022-03-04  0:07 ` [PATCH for-next v11 01/13] RDMA/rxe: Fix ref error in rxe_av.c Bob Pearson
  2022-03-04  0:07 ` [PATCH for-next v11 02/13] RDMA/rxe: Replace mr by rkey in responder resources Bob Pearson
@ 2022-03-04  0:07 ` Bob Pearson
  2022-03-04  0:08 ` [PATCH for-next v11 04/13] RDMA/rxe: Delete _locked() APIs for pool objects Bob Pearson
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Bob Pearson @ 2022-03-04  0:07 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

There is only one remaining object type that allocates its own memory,
that is mr. So the sense of RXE_POOL_NO_ALLOC is changed to
RXE_POOL_ALLOC. Add checks to rxe_alloc() and rxe_add_to_pool() to
make sure the correct call is used for the setting of this flag.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 21 ++++++++++++---------
 drivers/infiniband/sw/rxe/rxe_pool.h |  2 +-
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 16056b918ace..239c24544ff2 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -21,19 +21,17 @@ static const struct rxe_type_info {
 		.name		= "rxe-uc",
 		.size		= sizeof(struct rxe_ucontext),
 		.elem_offset	= offsetof(struct rxe_ucontext, elem),
-		.flags          = RXE_POOL_NO_ALLOC,
 	},
 	[RXE_TYPE_PD] = {
 		.name		= "rxe-pd",
 		.size		= sizeof(struct rxe_pd),
 		.elem_offset	= offsetof(struct rxe_pd, elem),
-		.flags		= RXE_POOL_NO_ALLOC,
 	},
 	[RXE_TYPE_AH] = {
 		.name		= "rxe-ah",
 		.size		= sizeof(struct rxe_ah),
 		.elem_offset	= offsetof(struct rxe_ah, elem),
-		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
+		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_AH_INDEX,
 		.max_index	= RXE_MAX_AH_INDEX,
 	},
@@ -41,7 +39,7 @@ static const struct rxe_type_info {
 		.name		= "rxe-srq",
 		.size		= sizeof(struct rxe_srq),
 		.elem_offset	= offsetof(struct rxe_srq, elem),
-		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
+		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_SRQ_INDEX,
 		.max_index	= RXE_MAX_SRQ_INDEX,
 	},
@@ -50,7 +48,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_qp),
 		.elem_offset	= offsetof(struct rxe_qp, elem),
 		.cleanup	= rxe_qp_cleanup,
-		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
+		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_QP_INDEX,
 		.max_index	= RXE_MAX_QP_INDEX,
 	},
@@ -58,7 +56,6 @@ static const struct rxe_type_info {
 		.name		= "rxe-cq",
 		.size		= sizeof(struct rxe_cq),
 		.elem_offset	= offsetof(struct rxe_cq, elem),
-		.flags          = RXE_POOL_NO_ALLOC,
 		.cleanup	= rxe_cq_cleanup,
 	},
 	[RXE_TYPE_MR] = {
@@ -66,7 +63,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_mr),
 		.elem_offset	= offsetof(struct rxe_mr, elem),
 		.cleanup	= rxe_mr_cleanup,
-		.flags		= RXE_POOL_INDEX,
+		.flags		= RXE_POOL_INDEX | RXE_POOL_ALLOC,
 		.min_index	= RXE_MIN_MR_INDEX,
 		.max_index	= RXE_MAX_MR_INDEX,
 	},
@@ -75,7 +72,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_mw),
 		.elem_offset	= offsetof(struct rxe_mw, elem),
 		.cleanup	= rxe_mw_cleanup,
-		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
+		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_MW_INDEX,
 		.max_index	= RXE_MAX_MW_INDEX,
 	},
@@ -264,6 +261,9 @@ void *rxe_alloc(struct rxe_pool *pool)
 	struct rxe_pool_elem *elem;
 	void *obj;
 
+	if (WARN_ON(!(pool->flags & RXE_POOL_ALLOC)))
+		return NULL;
+
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
 		goto out_cnt;
 
@@ -286,6 +286,9 @@ void *rxe_alloc(struct rxe_pool *pool)
 
 int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 {
+	if (WARN_ON(pool->flags & RXE_POOL_ALLOC))
+		return -EINVAL;
+
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
 		goto out_cnt;
 
@@ -310,7 +313,7 @@ void rxe_elem_release(struct kref *kref)
 	if (pool->cleanup)
 		pool->cleanup(elem);
 
-	if (!(pool->flags & RXE_POOL_NO_ALLOC)) {
+	if (pool->flags & RXE_POOL_ALLOC) {
 		obj = elem->obj;
 		kfree(obj);
 	}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 8fc95c6b7b9b..44b944c8c360 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -9,7 +9,7 @@
 
 enum rxe_pool_flags {
 	RXE_POOL_INDEX		= BIT(1),
-	RXE_POOL_NO_ALLOC	= BIT(4),
+	RXE_POOL_ALLOC		= BIT(2),
 };
 
 enum rxe_elem_type {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH for-next v11 04/13] RDMA/rxe: Delete _locked() APIs for pool objects
  2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
                   ` (2 preceding siblings ...)
  2022-03-04  0:07 ` [PATCH for-next v11 03/13] RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC Bob Pearson
@ 2022-03-04  0:08 ` Bob Pearson
  2022-03-04  0:08 ` [PATCH for-next v11 05/13] RDMA/rxe: Replace obj by elem in declaration Bob Pearson
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Bob Pearson @ 2022-03-04  0:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Since caller managed locks for indexed objects are no longer used
these APIs are deleted.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 67 ++++------------------------
 drivers/infiniband/sw/rxe/rxe_pool.h | 24 ++--------
 2 files changed, 12 insertions(+), 79 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 239c24544ff2..2e3543dde000 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -189,17 +189,6 @@ static int rxe_insert_index(struct rxe_pool *pool, struct rxe_pool_elem *new)
 	return 0;
 }
 
-int __rxe_add_index_locked(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-	int err;
-
-	elem->index = alloc_index(pool);
-	err = rxe_insert_index(pool, elem);
-
-	return err;
-}
-
 int __rxe_add_index(struct rxe_pool_elem *elem)
 {
 	struct rxe_pool *pool = elem->pool;
@@ -207,55 +196,24 @@ int __rxe_add_index(struct rxe_pool_elem *elem)
 	int err;
 
 	write_lock_irqsave(&pool->pool_lock, flags);
-	err = __rxe_add_index_locked(elem);
+	elem->index = alloc_index(pool);
+	err = rxe_insert_index(pool, elem);
 	write_unlock_irqrestore(&pool->pool_lock, flags);
 
 	return err;
 }
 
-void __rxe_drop_index_locked(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-
-	clear_bit(elem->index - pool->index.min_index, pool->index.table);
-	rb_erase(&elem->index_node, &pool->index.tree);
-}
-
 void __rxe_drop_index(struct rxe_pool_elem *elem)
 {
 	struct rxe_pool *pool = elem->pool;
 	unsigned long flags;
 
 	write_lock_irqsave(&pool->pool_lock, flags);
-	__rxe_drop_index_locked(elem);
+	clear_bit(elem->index - pool->index.min_index, pool->index.table);
+	rb_erase(&elem->index_node, &pool->index.tree);
 	write_unlock_irqrestore(&pool->pool_lock, flags);
 }
 
-void *rxe_alloc_locked(struct rxe_pool *pool)
-{
-	struct rxe_pool_elem *elem;
-	void *obj;
-
-	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
-		goto out_cnt;
-
-	obj = kzalloc(pool->elem_size, GFP_ATOMIC);
-	if (!obj)
-		goto out_cnt;
-
-	elem = (struct rxe_pool_elem *)((u8 *)obj + pool->elem_offset);
-
-	elem->pool = pool;
-	elem->obj = obj;
-	kref_init(&elem->ref_cnt);
-
-	return obj;
-
-out_cnt:
-	atomic_dec(&pool->num_elem);
-	return NULL;
-}
-
 void *rxe_alloc(struct rxe_pool *pool)
 {
 	struct rxe_pool_elem *elem;
@@ -321,12 +279,14 @@ void rxe_elem_release(struct kref *kref)
 	atomic_dec(&pool->num_elem);
 }
 
-void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index)
+void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 {
-	struct rb_node *node;
 	struct rxe_pool_elem *elem;
+	struct rb_node *node;
+	unsigned long flags;
 	void *obj;
 
+	read_lock_irqsave(&pool->pool_lock, flags);
 	node = pool->index.tree.rb_node;
 
 	while (node) {
@@ -346,17 +306,6 @@ void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index)
 	} else {
 		obj = NULL;
 	}
-
-	return obj;
-}
-
-void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
-{
-	unsigned long flags;
-	void *obj;
-
-	read_lock_irqsave(&pool->pool_lock, flags);
-	obj = rxe_pool_get_index_locked(pool, index);
 	read_unlock_irqrestore(&pool->pool_lock, flags);
 
 	return obj;
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 44b944c8c360..7fec5d96d695 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -68,9 +68,7 @@ int rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
 /* free resources from object pool */
 void rxe_pool_cleanup(struct rxe_pool *pool);
 
-/* allocate an object from pool holding and not holding the pool lock */
-void *rxe_alloc_locked(struct rxe_pool *pool);
-
+/* allocate an object from pool */
 void *rxe_alloc(struct rxe_pool *pool);
 
 /* connect already allocated object to pool */
@@ -79,32 +77,18 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem);
 #define rxe_add_to_pool(pool, obj) __rxe_add_to_pool(pool, &(obj)->elem)
 
 /* assign an index to an indexed object and insert object into
- *  pool's rb tree holding and not holding the pool_lock
+ * pool's rb tree
  */
-int __rxe_add_index_locked(struct rxe_pool_elem *elem);
-
-#define rxe_add_index_locked(obj) __rxe_add_index_locked(&(obj)->elem)
-
 int __rxe_add_index(struct rxe_pool_elem *elem);
 
 #define rxe_add_index(obj) __rxe_add_index(&(obj)->elem)
 
-/* drop an index and remove object from rb tree
- * holding and not holding the pool_lock
- */
-void __rxe_drop_index_locked(struct rxe_pool_elem *elem);
-
-#define rxe_drop_index_locked(obj) __rxe_drop_index_locked(&(obj)->elem)
-
+/* drop an index and remove object from rb tree */
 void __rxe_drop_index(struct rxe_pool_elem *elem);
 
 #define rxe_drop_index(obj) __rxe_drop_index(&(obj)->elem)
 
-/* lookup an indexed object from index holding and not holding the pool_lock.
- * takes a reference on object
- */
-void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index);
-
+/* lookup an indexed object from index. takes a reference on object */
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index);
 
 /* cleanup an object when all references are dropped */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH for-next v11 05/13] RDMA/rxe: Replace obj by elem in declaration
  2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
                   ` (3 preceding siblings ...)
  2022-03-04  0:08 ` [PATCH for-next v11 04/13] RDMA/rxe: Delete _locked() APIs for pool objects Bob Pearson
@ 2022-03-04  0:08 ` Bob Pearson
  2022-03-04  0:08 ` [PATCH for-next v11 06/13] RDMA/rxe: Move max_elem into rxe_type_info Bob Pearson
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Bob Pearson @ 2022-03-04  0:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Fix a harmless typo replacing obj by elem in the cleanup fields.
This has no effect but is confusing.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 2 +-
 drivers/infiniband/sw/rxe/rxe_pool.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 2e3543dde000..3b50fd3d9d70 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -12,7 +12,7 @@ static const struct rxe_type_info {
 	const char *name;
 	size_t size;
 	size_t elem_offset;
-	void (*cleanup)(struct rxe_pool_elem *obj);
+	void (*cleanup)(struct rxe_pool_elem *elem);
 	enum rxe_pool_flags flags;
 	u32 min_index;
 	u32 max_index;
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 7fec5d96d695..a8582ad85b1e 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -39,7 +39,7 @@ struct rxe_pool {
 	struct rxe_dev		*rxe;
 	const char		*name;
 	rwlock_t		pool_lock; /* protects pool add/del/search */
-	void			(*cleanup)(struct rxe_pool_elem *obj);
+	void			(*cleanup)(struct rxe_pool_elem *elem);
 	enum rxe_pool_flags	flags;
 	enum rxe_elem_type	type;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH for-next v11 06/13] RDMA/rxe: Move max_elem into rxe_type_info
  2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
                   ` (4 preceding siblings ...)
  2022-03-04  0:08 ` [PATCH for-next v11 05/13] RDMA/rxe: Replace obj by elem in declaration Bob Pearson
@ 2022-03-04  0:08 ` Bob Pearson
  2022-03-04  0:08 ` [PATCH for-next v11 07/13] RDMA/rxe: Shorten pool names in rxe_pool.c Bob Pearson
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Bob Pearson @ 2022-03-04  0:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Move the maximum number of elements from a parameter in rxe_pool_init
to a member of the rxe_type_info array.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c      | 24 ++++++++----------------
 drivers/infiniband/sw/rxe/rxe_pool.c | 14 +++++++++++---
 drivers/infiniband/sw/rxe/rxe_pool.h |  2 +-
 3 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index fce3994d8f7a..dc1f9dd70966 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -118,43 +118,35 @@ static int rxe_init_pools(struct rxe_dev *rxe)
 {
 	int err;
 
-	err = rxe_pool_init(rxe, &rxe->uc_pool, RXE_TYPE_UC,
-			    rxe->max_ucontext);
+	err = rxe_pool_init(rxe, &rxe->uc_pool, RXE_TYPE_UC);
 	if (err)
 		goto err1;
 
-	err = rxe_pool_init(rxe, &rxe->pd_pool, RXE_TYPE_PD,
-			    rxe->attr.max_pd);
+	err = rxe_pool_init(rxe, &rxe->pd_pool, RXE_TYPE_PD);
 	if (err)
 		goto err2;
 
-	err = rxe_pool_init(rxe, &rxe->ah_pool, RXE_TYPE_AH,
-			    rxe->attr.max_ah);
+	err = rxe_pool_init(rxe, &rxe->ah_pool, RXE_TYPE_AH);
 	if (err)
 		goto err3;
 
-	err = rxe_pool_init(rxe, &rxe->srq_pool, RXE_TYPE_SRQ,
-			    rxe->attr.max_srq);
+	err = rxe_pool_init(rxe, &rxe->srq_pool, RXE_TYPE_SRQ);
 	if (err)
 		goto err4;
 
-	err = rxe_pool_init(rxe, &rxe->qp_pool, RXE_TYPE_QP,
-			    rxe->attr.max_qp);
+	err = rxe_pool_init(rxe, &rxe->qp_pool, RXE_TYPE_QP);
 	if (err)
 		goto err5;
 
-	err = rxe_pool_init(rxe, &rxe->cq_pool, RXE_TYPE_CQ,
-			    rxe->attr.max_cq);
+	err = rxe_pool_init(rxe, &rxe->cq_pool, RXE_TYPE_CQ);
 	if (err)
 		goto err6;
 
-	err = rxe_pool_init(rxe, &rxe->mr_pool, RXE_TYPE_MR,
-			    rxe->attr.max_mr);
+	err = rxe_pool_init(rxe, &rxe->mr_pool, RXE_TYPE_MR);
 	if (err)
 		goto err7;
 
-	err = rxe_pool_init(rxe, &rxe->mw_pool, RXE_TYPE_MW,
-			    rxe->attr.max_mw);
+	err = rxe_pool_init(rxe, &rxe->mw_pool, RXE_TYPE_MW);
 	if (err)
 		goto err8;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 3b50fd3d9d70..bc3ae64adba8 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -16,16 +16,19 @@ static const struct rxe_type_info {
 	enum rxe_pool_flags flags;
 	u32 min_index;
 	u32 max_index;
+	u32 max_elem;
 } rxe_type_info[RXE_NUM_TYPES] = {
 	[RXE_TYPE_UC] = {
 		.name		= "rxe-uc",
 		.size		= sizeof(struct rxe_ucontext),
 		.elem_offset	= offsetof(struct rxe_ucontext, elem),
+		.max_elem	= UINT_MAX,
 	},
 	[RXE_TYPE_PD] = {
 		.name		= "rxe-pd",
 		.size		= sizeof(struct rxe_pd),
 		.elem_offset	= offsetof(struct rxe_pd, elem),
+		.max_elem	= UINT_MAX,
 	},
 	[RXE_TYPE_AH] = {
 		.name		= "rxe-ah",
@@ -34,6 +37,7 @@ static const struct rxe_type_info {
 		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_AH_INDEX,
 		.max_index	= RXE_MAX_AH_INDEX,
+		.max_elem	= RXE_MAX_AH_INDEX - RXE_MIN_AH_INDEX + 1,
 	},
 	[RXE_TYPE_SRQ] = {
 		.name		= "rxe-srq",
@@ -42,6 +46,7 @@ static const struct rxe_type_info {
 		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_SRQ_INDEX,
 		.max_index	= RXE_MAX_SRQ_INDEX,
+		.max_elem	= RXE_MAX_SRQ_INDEX - RXE_MIN_SRQ_INDEX + 1,
 	},
 	[RXE_TYPE_QP] = {
 		.name		= "rxe-qp",
@@ -51,12 +56,14 @@ static const struct rxe_type_info {
 		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_QP_INDEX,
 		.max_index	= RXE_MAX_QP_INDEX,
+		.max_elem	= RXE_MAX_QP_INDEX - RXE_MIN_QP_INDEX + 1,
 	},
 	[RXE_TYPE_CQ] = {
 		.name		= "rxe-cq",
 		.size		= sizeof(struct rxe_cq),
 		.elem_offset	= offsetof(struct rxe_cq, elem),
 		.cleanup	= rxe_cq_cleanup,
+		.max_elem	= UINT_MAX,
 	},
 	[RXE_TYPE_MR] = {
 		.name		= "rxe-mr",
@@ -66,6 +73,7 @@ static const struct rxe_type_info {
 		.flags		= RXE_POOL_INDEX | RXE_POOL_ALLOC,
 		.min_index	= RXE_MIN_MR_INDEX,
 		.max_index	= RXE_MAX_MR_INDEX,
+		.max_elem	= RXE_MAX_MR_INDEX - RXE_MIN_MR_INDEX + 1,
 	},
 	[RXE_TYPE_MW] = {
 		.name		= "rxe-mw",
@@ -75,6 +83,7 @@ static const struct rxe_type_info {
 		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_MW_INDEX,
 		.max_index	= RXE_MAX_MW_INDEX,
+		.max_elem	= RXE_MAX_MW_INDEX - RXE_MIN_MW_INDEX + 1,
 	},
 };
 
@@ -104,8 +113,7 @@ static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min)
 int rxe_pool_init(
 	struct rxe_dev		*rxe,
 	struct rxe_pool		*pool,
-	enum rxe_elem_type	type,
-	unsigned int		max_elem)
+	enum rxe_elem_type	type)
 {
 	const struct rxe_type_info *info = &rxe_type_info[type];
 	int			err = 0;
@@ -115,7 +123,7 @@ int rxe_pool_init(
 	pool->rxe		= rxe;
 	pool->name		= info->name;
 	pool->type		= type;
-	pool->max_elem		= max_elem;
+	pool->max_elem		= info->max_elem;
 	pool->elem_size		= ALIGN(info->size, RXE_POOL_ALIGN);
 	pool->elem_offset	= info->elem_offset;
 	pool->flags		= info->flags;
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index a8582ad85b1e..5f34d232d7f4 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -63,7 +63,7 @@ struct rxe_pool {
  * pool elements will be allocated out of a slab cache
  */
 int rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
-		  enum rxe_elem_type type, u32 max_elem);
+		  enum rxe_elem_type type);
 
 /* free resources from object pool */
 void rxe_pool_cleanup(struct rxe_pool *pool);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH for-next v11 07/13] RDMA/rxe: Shorten pool names in rxe_pool.c
  2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
                   ` (5 preceding siblings ...)
  2022-03-04  0:08 ` [PATCH for-next v11 06/13] RDMA/rxe: Move max_elem into rxe_type_info Bob Pearson
@ 2022-03-04  0:08 ` Bob Pearson
  2022-03-04  0:08 ` [PATCH for-next v11 08/13] RDMA/rxe: Replace red-black trees by xarrays Bob Pearson
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Bob Pearson @ 2022-03-04  0:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Replace pool names like "rxe-xx" with "xx". Just reduces clutter.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index bc3ae64adba8..c50baeb10bd2 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -19,19 +19,19 @@ static const struct rxe_type_info {
 	u32 max_elem;
 } rxe_type_info[RXE_NUM_TYPES] = {
 	[RXE_TYPE_UC] = {
-		.name		= "rxe-uc",
+		.name		= "uc",
 		.size		= sizeof(struct rxe_ucontext),
 		.elem_offset	= offsetof(struct rxe_ucontext, elem),
 		.max_elem	= UINT_MAX,
 	},
 	[RXE_TYPE_PD] = {
-		.name		= "rxe-pd",
+		.name		= "pd",
 		.size		= sizeof(struct rxe_pd),
 		.elem_offset	= offsetof(struct rxe_pd, elem),
 		.max_elem	= UINT_MAX,
 	},
 	[RXE_TYPE_AH] = {
-		.name		= "rxe-ah",
+		.name		= "ah",
 		.size		= sizeof(struct rxe_ah),
 		.elem_offset	= offsetof(struct rxe_ah, elem),
 		.flags		= RXE_POOL_INDEX,
@@ -40,7 +40,7 @@ static const struct rxe_type_info {
 		.max_elem	= RXE_MAX_AH_INDEX - RXE_MIN_AH_INDEX + 1,
 	},
 	[RXE_TYPE_SRQ] = {
-		.name		= "rxe-srq",
+		.name		= "srq",
 		.size		= sizeof(struct rxe_srq),
 		.elem_offset	= offsetof(struct rxe_srq, elem),
 		.flags		= RXE_POOL_INDEX,
@@ -49,7 +49,7 @@ static const struct rxe_type_info {
 		.max_elem	= RXE_MAX_SRQ_INDEX - RXE_MIN_SRQ_INDEX + 1,
 	},
 	[RXE_TYPE_QP] = {
-		.name		= "rxe-qp",
+		.name		= "qp",
 		.size		= sizeof(struct rxe_qp),
 		.elem_offset	= offsetof(struct rxe_qp, elem),
 		.cleanup	= rxe_qp_cleanup,
@@ -59,14 +59,14 @@ static const struct rxe_type_info {
 		.max_elem	= RXE_MAX_QP_INDEX - RXE_MIN_QP_INDEX + 1,
 	},
 	[RXE_TYPE_CQ] = {
-		.name		= "rxe-cq",
+		.name		= "cq",
 		.size		= sizeof(struct rxe_cq),
 		.elem_offset	= offsetof(struct rxe_cq, elem),
 		.cleanup	= rxe_cq_cleanup,
 		.max_elem	= UINT_MAX,
 	},
 	[RXE_TYPE_MR] = {
-		.name		= "rxe-mr",
+		.name		= "mr",
 		.size		= sizeof(struct rxe_mr),
 		.elem_offset	= offsetof(struct rxe_mr, elem),
 		.cleanup	= rxe_mr_cleanup,
@@ -76,7 +76,7 @@ static const struct rxe_type_info {
 		.max_elem	= RXE_MAX_MR_INDEX - RXE_MIN_MR_INDEX + 1,
 	},
 	[RXE_TYPE_MW] = {
-		.name		= "rxe-mw",
+		.name		= "mw",
 		.size		= sizeof(struct rxe_mw),
 		.elem_offset	= offsetof(struct rxe_mw, elem),
 		.cleanup	= rxe_mw_cleanup,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH for-next v11 08/13] RDMA/rxe: Replace red-black trees by xarrays
  2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
                   ` (6 preceding siblings ...)
  2022-03-04  0:08 ` [PATCH for-next v11 07/13] RDMA/rxe: Shorten pool names in rxe_pool.c Bob Pearson
@ 2022-03-04  0:08 ` Bob Pearson
  2022-03-15 23:45   ` Jason Gunthorpe
  2022-03-04  0:08 ` [PATCH for-next v11 09/13] RDMA/rxe: Use standard names for ref counting Bob Pearson
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Bob Pearson @ 2022-03-04  0:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently the rxe driver uses red-black trees to add indices to the rxe
object pools. Linux xarrays provide a better way to implement the same
functionality for indices. This patch replaces red-black trees by xarrays
for pool objects. Since xarrays already have a spinlock use that in place
of the pool rwlock. Make sure that all changes in the xarray(index) and
kref(ref counnt) occur atomically.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |  80 ++-------
 drivers/infiniband/sw/rxe/rxe_mr.c    |   1 -
 drivers/infiniband/sw/rxe/rxe_mw.c    |   8 -
 drivers/infiniband/sw/rxe/rxe_pool.c  | 236 +++++++++-----------------
 drivers/infiniband/sw/rxe/rxe_pool.h  |  43 ++---
 drivers/infiniband/sw/rxe/rxe_verbs.c |  12 --
 6 files changed, 103 insertions(+), 277 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index dc1f9dd70966..2dae7538a2ea 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -114,75 +114,26 @@ static void rxe_init_ports(struct rxe_dev *rxe)
 }
 
 /* init pools of managed objects */
-static int rxe_init_pools(struct rxe_dev *rxe)
+static void rxe_init_pools(struct rxe_dev *rxe)
 {
-	int err;
-
-	err = rxe_pool_init(rxe, &rxe->uc_pool, RXE_TYPE_UC);
-	if (err)
-		goto err1;
-
-	err = rxe_pool_init(rxe, &rxe->pd_pool, RXE_TYPE_PD);
-	if (err)
-		goto err2;
-
-	err = rxe_pool_init(rxe, &rxe->ah_pool, RXE_TYPE_AH);
-	if (err)
-		goto err3;
-
-	err = rxe_pool_init(rxe, &rxe->srq_pool, RXE_TYPE_SRQ);
-	if (err)
-		goto err4;
-
-	err = rxe_pool_init(rxe, &rxe->qp_pool, RXE_TYPE_QP);
-	if (err)
-		goto err5;
-
-	err = rxe_pool_init(rxe, &rxe->cq_pool, RXE_TYPE_CQ);
-	if (err)
-		goto err6;
-
-	err = rxe_pool_init(rxe, &rxe->mr_pool, RXE_TYPE_MR);
-	if (err)
-		goto err7;
-
-	err = rxe_pool_init(rxe, &rxe->mw_pool, RXE_TYPE_MW);
-	if (err)
-		goto err8;
-
-	return 0;
-
-err8:
-	rxe_pool_cleanup(&rxe->mr_pool);
-err7:
-	rxe_pool_cleanup(&rxe->cq_pool);
-err6:
-	rxe_pool_cleanup(&rxe->qp_pool);
-err5:
-	rxe_pool_cleanup(&rxe->srq_pool);
-err4:
-	rxe_pool_cleanup(&rxe->ah_pool);
-err3:
-	rxe_pool_cleanup(&rxe->pd_pool);
-err2:
-	rxe_pool_cleanup(&rxe->uc_pool);
-err1:
-	return err;
+	rxe_pool_init(rxe, &rxe->uc_pool, RXE_TYPE_UC);
+	rxe_pool_init(rxe, &rxe->pd_pool, RXE_TYPE_PD);
+	rxe_pool_init(rxe, &rxe->ah_pool, RXE_TYPE_AH);
+	rxe_pool_init(rxe, &rxe->srq_pool, RXE_TYPE_SRQ);
+	rxe_pool_init(rxe, &rxe->qp_pool, RXE_TYPE_QP);
+	rxe_pool_init(rxe, &rxe->cq_pool, RXE_TYPE_CQ);
+	rxe_pool_init(rxe, &rxe->mr_pool, RXE_TYPE_MR);
+	rxe_pool_init(rxe, &rxe->mw_pool, RXE_TYPE_MW);
 }
 
 /* initialize rxe device state */
-static int rxe_init(struct rxe_dev *rxe)
+static void rxe_init(struct rxe_dev *rxe)
 {
-	int err;
-
 	/* init default device parameters */
 	rxe_init_device_param(rxe);
 
 	rxe_init_ports(rxe);
-
-	err = rxe_init_pools(rxe);
-	if (err)
-		return err;
+	rxe_init_pools(rxe);
 
 	/* init pending mmap list */
 	spin_lock_init(&rxe->mmap_offset_lock);
@@ -194,8 +145,6 @@ static int rxe_init(struct rxe_dev *rxe)
 	rxe->mcg_tree = RB_ROOT;
 
 	mutex_init(&rxe->usdev_lock);
-
-	return 0;
 }
 
 void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu)
@@ -217,12 +166,7 @@ void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu)
  */
 int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name)
 {
-	int err;
-
-	err = rxe_init(rxe);
-	if (err)
-		return err;
-
+	rxe_init(rxe);
 	rxe_set_mtu(rxe, mtu);
 
 	return rxe_register_device(rxe, ibdev_name);
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 453ef3c9d535..35628b8a00b4 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -691,7 +691,6 @@ int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 
 	mr->state = RXE_MR_STATE_INVALID;
 	rxe_drop_ref(mr_pd(mr));
-	rxe_drop_index(mr);
 	rxe_drop_ref(mr);
 
 	return 0;
diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c
index 32dd8c0b8b9e..7df36c40eec2 100644
--- a/drivers/infiniband/sw/rxe/rxe_mw.c
+++ b/drivers/infiniband/sw/rxe/rxe_mw.c
@@ -20,7 +20,6 @@ int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata)
 		return ret;
 	}
 
-	rxe_add_index(mw);
 	mw->rkey = ibmw->rkey = (mw->elem.index << 8) | rxe_get_next_key(-1);
 	mw->state = (mw->ibmw.type == IB_MW_TYPE_2) ?
 			RXE_MW_STATE_FREE : RXE_MW_STATE_VALID;
@@ -329,10 +328,3 @@ struct rxe_mw *rxe_lookup_mw(struct rxe_qp *qp, int access, u32 rkey)
 
 	return mw;
 }
-
-void rxe_mw_cleanup(struct rxe_pool_elem *elem)
-{
-	struct rxe_mw *mw = container_of(elem, typeof(*mw), elem);
-
-	rxe_drop_index(mw);
-}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index c50baeb10bd2..b0dfeb08a470 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -22,19 +22,22 @@ static const struct rxe_type_info {
 		.name		= "uc",
 		.size		= sizeof(struct rxe_ucontext),
 		.elem_offset	= offsetof(struct rxe_ucontext, elem),
+		.min_index	= 1,
+		.max_index	= UINT_MAX,
 		.max_elem	= UINT_MAX,
 	},
 	[RXE_TYPE_PD] = {
 		.name		= "pd",
 		.size		= sizeof(struct rxe_pd),
 		.elem_offset	= offsetof(struct rxe_pd, elem),
+		.min_index	= 1,
+		.max_index	= UINT_MAX,
 		.max_elem	= UINT_MAX,
 	},
 	[RXE_TYPE_AH] = {
 		.name		= "ah",
 		.size		= sizeof(struct rxe_ah),
 		.elem_offset	= offsetof(struct rxe_ah, elem),
-		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_AH_INDEX,
 		.max_index	= RXE_MAX_AH_INDEX,
 		.max_elem	= RXE_MAX_AH_INDEX - RXE_MIN_AH_INDEX + 1,
@@ -43,7 +46,6 @@ static const struct rxe_type_info {
 		.name		= "srq",
 		.size		= sizeof(struct rxe_srq),
 		.elem_offset	= offsetof(struct rxe_srq, elem),
-		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_SRQ_INDEX,
 		.max_index	= RXE_MAX_SRQ_INDEX,
 		.max_elem	= RXE_MAX_SRQ_INDEX - RXE_MIN_SRQ_INDEX + 1,
@@ -53,7 +55,6 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_qp),
 		.elem_offset	= offsetof(struct rxe_qp, elem),
 		.cleanup	= rxe_qp_cleanup,
-		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_QP_INDEX,
 		.max_index	= RXE_MAX_QP_INDEX,
 		.max_elem	= RXE_MAX_QP_INDEX - RXE_MIN_QP_INDEX + 1,
@@ -63,6 +64,8 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_cq),
 		.elem_offset	= offsetof(struct rxe_cq, elem),
 		.cleanup	= rxe_cq_cleanup,
+		.min_index	= 1,
+		.max_index	= UINT_MAX,
 		.max_elem	= UINT_MAX,
 	},
 	[RXE_TYPE_MR] = {
@@ -70,7 +73,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_mr),
 		.elem_offset	= offsetof(struct rxe_mr, elem),
 		.cleanup	= rxe_mr_cleanup,
-		.flags		= RXE_POOL_INDEX | RXE_POOL_ALLOC,
+		.flags		= RXE_POOL_ALLOC,
 		.min_index	= RXE_MIN_MR_INDEX,
 		.max_index	= RXE_MAX_MR_INDEX,
 		.max_elem	= RXE_MAX_MR_INDEX - RXE_MIN_MR_INDEX + 1,
@@ -79,44 +82,16 @@ static const struct rxe_type_info {
 		.name		= "mw",
 		.size		= sizeof(struct rxe_mw),
 		.elem_offset	= offsetof(struct rxe_mw, elem),
-		.cleanup	= rxe_mw_cleanup,
-		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_MW_INDEX,
 		.max_index	= RXE_MAX_MW_INDEX,
 		.max_elem	= RXE_MAX_MW_INDEX - RXE_MIN_MW_INDEX + 1,
 	},
 };
 
-static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min)
-{
-	int err = 0;
-
-	if ((max - min + 1) < pool->max_elem) {
-		pr_warn("not enough indices for max_elem\n");
-		err = -EINVAL;
-		goto out;
-	}
-
-	pool->index.max_index = max;
-	pool->index.min_index = min;
-
-	pool->index.table = bitmap_zalloc(max - min + 1, GFP_KERNEL);
-	if (!pool->index.table) {
-		err = -ENOMEM;
-		goto out;
-	}
-
-out:
-	return err;
-}
-
-int rxe_pool_init(
-	struct rxe_dev		*rxe,
-	struct rxe_pool		*pool,
-	enum rxe_elem_type	type)
+void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
+		   enum rxe_elem_type type)
 {
 	const struct rxe_type_info *info = &rxe_type_info[type];
-	int			err = 0;
 
 	memset(pool, 0, sizeof(*pool));
 
@@ -131,111 +106,52 @@ int rxe_pool_init(
 
 	atomic_set(&pool->num_elem, 0);
 
-	rwlock_init(&pool->pool_lock);
-
-	if (pool->flags & RXE_POOL_INDEX) {
-		pool->index.tree = RB_ROOT;
-		err = rxe_pool_init_index(pool, info->max_index,
-					  info->min_index);
-		if (err)
-			goto out;
-	}
-
-out:
-	return err;
+	xa_init_flags(&pool->xa, XA_FLAGS_ALLOC);
+	pool->limit.min = info->min_index;
+	pool->limit.max = info->max_index;
 }
 
 void rxe_pool_cleanup(struct rxe_pool *pool)
 {
-	if (atomic_read(&pool->num_elem) > 0)
-		pr_warn("%s pool destroyed with unfree'd elem\n",
-			pool->name);
-
-	if (pool->flags & RXE_POOL_INDEX)
-		bitmap_free(pool->index.table);
-}
-
-static u32 alloc_index(struct rxe_pool *pool)
-{
-	u32 index;
-	u32 range = pool->index.max_index - pool->index.min_index + 1;
-
-	index = find_next_zero_bit(pool->index.table, range, pool->index.last);
-	if (index >= range)
-		index = find_first_zero_bit(pool->index.table, range);
-
-	WARN_ON_ONCE(index >= range);
-	set_bit(index, pool->index.table);
-	pool->index.last = index;
-	return index + pool->index.min_index;
-}
-
-static int rxe_insert_index(struct rxe_pool *pool, struct rxe_pool_elem *new)
-{
-	struct rb_node **link = &pool->index.tree.rb_node;
-	struct rb_node *parent = NULL;
 	struct rxe_pool_elem *elem;
-
-	while (*link) {
-		parent = *link;
-		elem = rb_entry(parent, struct rxe_pool_elem, index_node);
-
-		if (elem->index == new->index) {
-			pr_warn("element already exists!\n");
-			return -EINVAL;
+	struct xarray *xa = &pool->xa;
+	unsigned long index = 0;
+	unsigned long max = ULONG_MAX;
+	unsigned int elem_count = 0;
+	unsigned int obj_count = 0;
+
+	do {
+		elem = xa_find(xa, &index, max, XA_PRESENT);
+		if (elem) {
+			elem_count++;
+			xa_erase(xa, index);
+			if (pool->flags & RXE_POOL_ALLOC) {
+				kfree(elem->obj);
+				obj_count++;
+			}
 		}
+	} while (elem);
 
-		if (elem->index > new->index)
-			link = &(*link)->rb_left;
-		else
-			link = &(*link)->rb_right;
-	}
-
-	rb_link_node(&new->index_node, parent, link);
-	rb_insert_color(&new->index_node, &pool->index.tree);
-
-	return 0;
-}
-
-int __rxe_add_index(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-	unsigned long flags;
-	int err;
-
-	write_lock_irqsave(&pool->pool_lock, flags);
-	elem->index = alloc_index(pool);
-	err = rxe_insert_index(pool, elem);
-	write_unlock_irqrestore(&pool->pool_lock, flags);
-
-	return err;
-}
-
-void __rxe_drop_index(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-	unsigned long flags;
-
-	write_lock_irqsave(&pool->pool_lock, flags);
-	clear_bit(elem->index - pool->index.min_index, pool->index.table);
-	rb_erase(&elem->index_node, &pool->index.tree);
-	write_unlock_irqrestore(&pool->pool_lock, flags);
+	if (WARN_ON(elem_count || obj_count))
+		pr_debug("Freed %d indices and %d objects from pool %s\n",
+			elem_count, obj_count, pool->name);
 }
 
 void *rxe_alloc(struct rxe_pool *pool)
 {
 	struct rxe_pool_elem *elem;
 	void *obj;
+	int err;
 
 	if (WARN_ON(!(pool->flags & RXE_POOL_ALLOC)))
 		return NULL;
 
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
-		goto out_cnt;
+		goto err_cnt;
 
 	obj = kzalloc(pool->elem_size, GFP_KERNEL);
 	if (!obj)
-		goto out_cnt;
+		goto err_cnt;
 
 	elem = (struct rxe_pool_elem *)((u8 *)obj + pool->elem_offset);
 
@@ -243,78 +159,86 @@ void *rxe_alloc(struct rxe_pool *pool)
 	elem->obj = obj;
 	kref_init(&elem->ref_cnt);
 
+	err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
+			      &pool->next, GFP_KERNEL);
+	if (err)
+		goto err_free;
+
 	return obj;
 
-out_cnt:
+err_free:
+	kfree(obj);
+err_cnt:
 	atomic_dec(&pool->num_elem);
 	return NULL;
 }
 
 int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 {
+	int err;
+
 	if (WARN_ON(pool->flags & RXE_POOL_ALLOC))
 		return -EINVAL;
 
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
-		goto out_cnt;
+		goto err_cnt;
 
 	elem->pool = pool;
 	elem->obj = (u8 *)elem - pool->elem_offset;
 	kref_init(&elem->ref_cnt);
 
+	err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
+			      &pool->next, GFP_KERNEL);
+	if (err)
+		goto err_cnt;
+
 	return 0;
 
-out_cnt:
+err_cnt:
 	atomic_dec(&pool->num_elem);
 	return -EINVAL;
 }
 
-void rxe_elem_release(struct kref *kref)
+void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 {
-	struct rxe_pool_elem *elem =
-		container_of(kref, struct rxe_pool_elem, ref_cnt);
-	struct rxe_pool *pool = elem->pool;
+	struct rxe_pool_elem *elem;
+	struct xarray *xa = &pool->xa;
+	unsigned long flags;
 	void *obj;
 
-	if (pool->cleanup)
-		pool->cleanup(elem);
-
-	if (pool->flags & RXE_POOL_ALLOC) {
+	xa_lock_irqsave(xa, flags);
+	elem = xa_load(xa, index);
+	if (elem && kref_get_unless_zero(&elem->ref_cnt))
 		obj = elem->obj;
-		kfree(obj);
-	}
+	else
+		obj = NULL;
+	xa_unlock_irqrestore(xa, flags);
 
-	atomic_dec(&pool->num_elem);
+	return obj;
 }
 
-void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
+static void rxe_elem_release(struct kref *kref)
 {
-	struct rxe_pool_elem *elem;
-	struct rb_node *node;
-	unsigned long flags;
-	void *obj;
+	struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt);
+	struct rxe_pool *pool = elem->pool;
 
-	read_lock_irqsave(&pool->pool_lock, flags);
-	node = pool->index.tree.rb_node;
+	xa_erase(&pool->xa, elem->index);
 
-	while (node) {
-		elem = rb_entry(node, struct rxe_pool_elem, index_node);
+	if (pool->cleanup)
+		pool->cleanup(elem);
 
-		if (elem->index > index)
-			node = node->rb_left;
-		else if (elem->index < index)
-			node = node->rb_right;
-		else
-			break;
-	}
+	if (pool->flags & RXE_POOL_ALLOC)
+		kfree(elem->obj);
 
-	if (node) {
-		kref_get(&elem->ref_cnt);
-		obj = elem->obj;
-	} else {
-		obj = NULL;
-	}
-	read_unlock_irqrestore(&pool->pool_lock, flags);
+	atomic_dec(&pool->num_elem);
+}
 
-	return obj;
+int __rxe_get(struct rxe_pool_elem *elem)
+{
+	return kref_get_unless_zero(&elem->ref_cnt);
+}
+
+int __rxe_put(struct rxe_pool_elem *elem)
+{
+	return kref_put(&elem->ref_cnt, rxe_elem_release);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 5f34d232d7f4..d1e05d384b2c 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -8,8 +8,7 @@
 #define RXE_POOL_H
 
 enum rxe_pool_flags {
-	RXE_POOL_INDEX		= BIT(1),
-	RXE_POOL_ALLOC		= BIT(2),
+	RXE_POOL_ALLOC		= BIT(1),
 };
 
 enum rxe_elem_type {
@@ -29,16 +28,12 @@ struct rxe_pool_elem {
 	void			*obj;
 	struct kref		ref_cnt;
 	struct list_head	list;
-
-	/* only used if indexed */
-	struct rb_node		index_node;
 	u32			index;
 };
 
 struct rxe_pool {
 	struct rxe_dev		*rxe;
 	const char		*name;
-	rwlock_t		pool_lock; /* protects pool add/del/search */
 	void			(*cleanup)(struct rxe_pool_elem *elem);
 	enum rxe_pool_flags	flags;
 	enum rxe_elem_type	type;
@@ -48,21 +43,16 @@ struct rxe_pool {
 	size_t			elem_size;
 	size_t			elem_offset;
 
-	/* only used if indexed */
-	struct {
-		struct rb_root		tree;
-		unsigned long		*table;
-		u32			last;
-		u32			max_index;
-		u32			min_index;
-	} index;
+	struct xarray		xa;
+	struct xa_limit		limit;
+	u32			next;
 };
 
 /* initialize a pool of objects with given limit on
  * number of elements. gets parameters from rxe_type_info
  * pool elements will be allocated out of a slab cache
  */
-int rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
+void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
 		  enum rxe_elem_type type);
 
 /* free resources from object pool */
@@ -76,29 +66,18 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem);
 
 #define rxe_add_to_pool(pool, obj) __rxe_add_to_pool(pool, &(obj)->elem)
 
-/* assign an index to an indexed object and insert object into
- * pool's rb tree
- */
-int __rxe_add_index(struct rxe_pool_elem *elem);
-
-#define rxe_add_index(obj) __rxe_add_index(&(obj)->elem)
-
-/* drop an index and remove object from rb tree */
-void __rxe_drop_index(struct rxe_pool_elem *elem);
-
-#define rxe_drop_index(obj) __rxe_drop_index(&(obj)->elem)
-
 /* lookup an indexed object from index. takes a reference on object */
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index);
 
-/* cleanup an object when all references are dropped */
-void rxe_elem_release(struct kref *kref);
-
 /* take a reference on an object */
-#define rxe_add_ref(obj) kref_get(&(obj)->elem.ref_cnt)
+int __rxe_get(struct rxe_pool_elem *elem);
+
+#define rxe_add_ref(obj) __rxe_get(&(obj)->elem)
 
 /* drop a reference on an object */
-#define rxe_drop_ref(obj) kref_put(&(obj)->elem.ref_cnt, rxe_elem_release)
+int __rxe_put(struct rxe_pool_elem *elem);
+
+#define rxe_drop_ref(obj) __rxe_put(&(obj)->elem)
 
 #define rxe_read_ref(obj) kref_read(&(obj)->elem.ref_cnt)
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 80df9a8f71a1..f0c5715ac500 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -181,7 +181,6 @@ static int rxe_create_ah(struct ib_ah *ibah,
 		return err;
 
 	/* create index > 0 */
-	rxe_add_index(ah);
 	ah->ah_num = ah->elem.index;
 
 	if (uresp) {
@@ -189,7 +188,6 @@ static int rxe_create_ah(struct ib_ah *ibah,
 		err = copy_to_user(&uresp->ah_num, &ah->ah_num,
 					 sizeof(uresp->ah_num));
 		if (err) {
-			rxe_drop_index(ah);
 			rxe_drop_ref(ah);
 			return -EFAULT;
 		}
@@ -230,7 +228,6 @@ static int rxe_destroy_ah(struct ib_ah *ibah, u32 flags)
 {
 	struct rxe_ah *ah = to_rah(ibah);
 
-	rxe_drop_index(ah);
 	rxe_drop_ref(ah);
 	return 0;
 }
@@ -438,7 +435,6 @@ static int rxe_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init,
 	if (err)
 		return err;
 
-	rxe_add_index(qp);
 	err = rxe_qp_from_init(rxe, qp, pd, init, uresp, ibqp->pd, udata);
 	if (err)
 		goto qp_init;
@@ -446,7 +442,6 @@ static int rxe_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init,
 	return 0;
 
 qp_init:
-	rxe_drop_index(qp);
 	rxe_drop_ref(qp);
 	return err;
 }
@@ -501,7 +496,6 @@ static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 		return ret;
 
 	rxe_qp_destroy(qp);
-	rxe_drop_index(qp);
 	rxe_drop_ref(qp);
 	return 0;
 }
@@ -908,7 +902,6 @@ static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access)
 	if (!mr)
 		return ERR_PTR(-ENOMEM);
 
-	rxe_add_index(mr);
 	rxe_add_ref(pd);
 	rxe_mr_init_dma(pd, access, mr);
 
@@ -932,7 +925,6 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 		goto err2;
 	}
 
-	rxe_add_index(mr);
 
 	rxe_add_ref(pd);
 
@@ -944,7 +936,6 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 
 err3:
 	rxe_drop_ref(pd);
-	rxe_drop_index(mr);
 	rxe_drop_ref(mr);
 err2:
 	return ERR_PTR(err);
@@ -967,8 +958,6 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
 		goto err1;
 	}
 
-	rxe_add_index(mr);
-
 	rxe_add_ref(pd);
 
 	err = rxe_mr_init_fast(pd, max_num_sg, mr);
@@ -979,7 +968,6 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
 
 err2:
 	rxe_drop_ref(pd);
-	rxe_drop_index(mr);
 	rxe_drop_ref(mr);
 err1:
 	return ERR_PTR(err);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH for-next v11 09/13] RDMA/rxe: Use standard names for ref counting
  2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
                   ` (7 preceding siblings ...)
  2022-03-04  0:08 ` [PATCH for-next v11 08/13] RDMA/rxe: Replace red-black trees by xarrays Bob Pearson
@ 2022-03-04  0:08 ` Bob Pearson
  2022-03-04  0:08 ` [PATCH for-next v11 10/13] RDMA/rxe: Stop lookup of partially built objects Bob Pearson
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Bob Pearson @ 2022-03-04  0:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Rename rxe_add_ref() to rxe_get() and rxe_drop_ref() to rxe_put().
Significantly improves readability for new readers.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_av.c    |  4 +--
 drivers/infiniband/sw/rxe/rxe_comp.c  |  8 +++---
 drivers/infiniband/sw/rxe/rxe_mcast.c |  4 +--
 drivers/infiniband/sw/rxe/rxe_mr.c    | 14 +++++-----
 drivers/infiniband/sw/rxe/rxe_mw.c    | 30 ++++++++++-----------
 drivers/infiniband/sw/rxe/rxe_net.c   |  6 ++---
 drivers/infiniband/sw/rxe/rxe_pool.h  |  8 +++---
 drivers/infiniband/sw/rxe/rxe_qp.c    | 28 ++++++++++----------
 drivers/infiniband/sw/rxe/rxe_recv.c  |  8 +++---
 drivers/infiniband/sw/rxe/rxe_req.c   | 10 +++----
 drivers/infiniband/sw/rxe/rxe_resp.c  | 32 +++++++++++-----------
 drivers/infiniband/sw/rxe/rxe_verbs.c | 38 +++++++++++++--------------
 12 files changed, 94 insertions(+), 96 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c
index 360a567159fe..3b05314ca739 100644
--- a/drivers/infiniband/sw/rxe/rxe_av.c
+++ b/drivers/infiniband/sw/rxe/rxe_av.c
@@ -127,14 +127,14 @@ struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt, struct rxe_ah **ahp)
 
 		if (rxe_ah_pd(ah) != pkt->qp->pd) {
 			pr_warn("PDs don't match for AH and QP\n");
-			rxe_drop_ref(ah);
+			rxe_put(ah);
 			return NULL;
 		}
 
 		if (ahp)
 			*ahp = ah;
 		else
-			rxe_drop_ref(ah);
+			rxe_put(ah);
 
 		return &ah->av;
 	}
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index f363fe3fa414..138b3e7d3a5f 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -526,7 +526,7 @@ static void rxe_drain_resp_pkts(struct rxe_qp *qp, bool notify)
 	struct rxe_queue *q = qp->sq.queue;
 
 	while ((skb = skb_dequeue(&qp->resp_pkts))) {
-		rxe_drop_ref(qp);
+		rxe_put(qp);
 		kfree_skb(skb);
 		ib_device_put(qp->ibqp.device);
 	}
@@ -548,7 +548,7 @@ static void free_pkt(struct rxe_pkt_info *pkt)
 	struct ib_device *dev = qp->ibqp.device;
 
 	kfree_skb(skb);
-	rxe_drop_ref(qp);
+	rxe_put(qp);
 	ib_device_put(dev);
 }
 
@@ -562,7 +562,7 @@ int rxe_completer(void *arg)
 	enum comp_state state;
 	int ret = 0;
 
-	rxe_add_ref(qp);
+	rxe_get(qp);
 
 	if (!qp->valid || qp->req.state == QP_STATE_ERROR ||
 	    qp->req.state == QP_STATE_RESET) {
@@ -761,7 +761,7 @@ int rxe_completer(void *arg)
 done:
 	if (pkt)
 		free_pkt(pkt);
-	rxe_drop_ref(qp);
+	rxe_put(qp);
 
 	return ret;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index c399a29b648b..ae8f11cb704a 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -319,7 +319,7 @@ static int __rxe_init_mca(struct rxe_qp *qp, struct rxe_mcg *mcg,
 
 	atomic_inc(&qp->mcg_num);
 
-	rxe_add_ref(qp);
+	rxe_get(qp);
 	mca->qp = qp;
 
 	list_add_tail(&mca->qp_list, &mcg->qp_list);
@@ -389,7 +389,7 @@ static void __rxe_cleanup_mca(struct rxe_mca *mca, struct rxe_mcg *mcg)
 	atomic_dec(&mcg->qp_num);
 	atomic_dec(&mcg->rxe->mcg_attach);
 	atomic_dec(&mca->qp->mcg_num);
-	rxe_drop_ref(mca->qp);
+	rxe_put(mca->qp);
 
 	kfree(mca);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 35628b8a00b4..60a31b718774 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -459,7 +459,7 @@ int copy_data(
 
 		if (offset >= sge->length) {
 			if (mr) {
-				rxe_drop_ref(mr);
+				rxe_put(mr);
 				mr = NULL;
 			}
 			sge++;
@@ -504,13 +504,13 @@ int copy_data(
 	dma->resid	= resid;
 
 	if (mr)
-		rxe_drop_ref(mr);
+		rxe_put(mr);
 
 	return 0;
 
 err2:
 	if (mr)
-		rxe_drop_ref(mr);
+		rxe_put(mr);
 err1:
 	return err;
 }
@@ -569,7 +569,7 @@ struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key,
 		     (type == RXE_LOOKUP_REMOTE && mr->rkey != key) ||
 		     mr_pd(mr) != pd || (access && !(access & mr->access)) ||
 		     mr->state != RXE_MR_STATE_VALID)) {
-		rxe_drop_ref(mr);
+		rxe_put(mr);
 		mr = NULL;
 	}
 
@@ -613,7 +613,7 @@ int rxe_invalidate_mr(struct rxe_qp *qp, u32 rkey)
 	ret = 0;
 
 err_drop_ref:
-	rxe_drop_ref(mr);
+	rxe_put(mr);
 err:
 	return ret;
 }
@@ -690,8 +690,8 @@ int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 	}
 
 	mr->state = RXE_MR_STATE_INVALID;
-	rxe_drop_ref(mr_pd(mr));
-	rxe_drop_ref(mr);
+	rxe_put(mr_pd(mr));
+	rxe_put(mr);
 
 	return 0;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c
index 7df36c40eec2..c86b2efd58f2 100644
--- a/drivers/infiniband/sw/rxe/rxe_mw.c
+++ b/drivers/infiniband/sw/rxe/rxe_mw.c
@@ -12,11 +12,11 @@ int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata)
 	struct rxe_dev *rxe = to_rdev(ibmw->device);
 	int ret;
 
-	rxe_add_ref(pd);
+	rxe_get(pd);
 
 	ret = rxe_add_to_pool(&rxe->mw_pool, mw);
 	if (ret) {
-		rxe_drop_ref(pd);
+		rxe_put(pd);
 		return ret;
 	}
 
@@ -35,14 +35,14 @@ static void rxe_do_dealloc_mw(struct rxe_mw *mw)
 
 		mw->mr = NULL;
 		atomic_dec(&mr->num_mw);
-		rxe_drop_ref(mr);
+		rxe_put(mr);
 	}
 
 	if (mw->qp) {
 		struct rxe_qp *qp = mw->qp;
 
 		mw->qp = NULL;
-		rxe_drop_ref(qp);
+		rxe_put(qp);
 	}
 
 	mw->access = 0;
@@ -60,8 +60,8 @@ int rxe_dealloc_mw(struct ib_mw *ibmw)
 	rxe_do_dealloc_mw(mw);
 	spin_unlock_bh(&mw->lock);
 
-	rxe_drop_ref(mw);
-	rxe_drop_ref(pd);
+	rxe_put(mw);
+	rxe_put(pd);
 
 	return 0;
 }
@@ -170,7 +170,7 @@ static void rxe_do_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
 	mw->length = wqe->wr.wr.mw.length;
 
 	if (mw->mr) {
-		rxe_drop_ref(mw->mr);
+		rxe_put(mw->mr);
 		atomic_dec(&mw->mr->num_mw);
 		mw->mr = NULL;
 	}
@@ -178,11 +178,11 @@ static void rxe_do_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
 	if (mw->length) {
 		mw->mr = mr;
 		atomic_inc(&mr->num_mw);
-		rxe_add_ref(mr);
+		rxe_get(mr);
 	}
 
 	if (mw->ibmw.type == IB_MW_TYPE_2) {
-		rxe_add_ref(qp);
+		rxe_get(qp);
 		mw->qp = qp;
 	}
 }
@@ -233,9 +233,9 @@ int rxe_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 	spin_unlock_bh(&mw->lock);
 err_drop_mr:
 	if (mr)
-		rxe_drop_ref(mr);
+		rxe_put(mr);
 err_drop_mw:
-	rxe_drop_ref(mw);
+	rxe_put(mw);
 err:
 	return ret;
 }
@@ -260,13 +260,13 @@ static void rxe_do_invalidate_mw(struct rxe_mw *mw)
 	/* valid type 2 MW will always have a QP pointer */
 	qp = mw->qp;
 	mw->qp = NULL;
-	rxe_drop_ref(qp);
+	rxe_put(qp);
 
 	/* valid type 2 MW will always have an MR pointer */
 	mr = mw->mr;
 	mw->mr = NULL;
 	atomic_dec(&mr->num_mw);
-	rxe_drop_ref(mr);
+	rxe_put(mr);
 
 	mw->access = 0;
 	mw->addr = 0;
@@ -301,7 +301,7 @@ int rxe_invalidate_mw(struct rxe_qp *qp, u32 rkey)
 err_unlock:
 	spin_unlock_bh(&mw->lock);
 err_drop_ref:
-	rxe_drop_ref(mw);
+	rxe_put(mw);
 err:
 	return ret;
 }
@@ -322,7 +322,7 @@ struct rxe_mw *rxe_lookup_mw(struct rxe_qp *qp, int access, u32 rkey)
 		     (mw->length == 0) ||
 		     (access && !(access & mw->access)) ||
 		     mw->state != RXE_MW_STATE_VALID)) {
-		rxe_drop_ref(mw);
+		rxe_put(mw);
 		return NULL;
 	}
 
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index b06f22ffc5a8..c53f4529f098 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -348,7 +348,7 @@ static void rxe_skb_tx_dtor(struct sk_buff *skb)
 		     skb_out < RXE_INFLIGHT_SKBS_PER_QP_LOW))
 		rxe_run_task(&qp->req.task, 1);
 
-	rxe_drop_ref(qp);
+	rxe_put(qp);
 }
 
 static int rxe_send(struct sk_buff *skb, struct rxe_pkt_info *pkt)
@@ -358,7 +358,7 @@ static int rxe_send(struct sk_buff *skb, struct rxe_pkt_info *pkt)
 	skb->destructor = rxe_skb_tx_dtor;
 	skb->sk = pkt->qp->sk->sk;
 
-	rxe_add_ref(pkt->qp);
+	rxe_get(pkt->qp);
 	atomic_inc(&pkt->qp->skb_out);
 
 	if (skb->protocol == htons(ETH_P_IP)) {
@@ -368,7 +368,7 @@ static int rxe_send(struct sk_buff *skb, struct rxe_pkt_info *pkt)
 	} else {
 		pr_err("Unknown layer 3 protocol: %d\n", skb->protocol);
 		atomic_dec(&pkt->qp->skb_out);
-		rxe_drop_ref(pkt->qp);
+		rxe_put(pkt->qp);
 		kfree_skb(skb);
 		return -EINVAL;
 	}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index d1e05d384b2c..24bcc786c1b3 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -69,16 +69,14 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem);
 /* lookup an indexed object from index. takes a reference on object */
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index);
 
-/* take a reference on an object */
 int __rxe_get(struct rxe_pool_elem *elem);
 
-#define rxe_add_ref(obj) __rxe_get(&(obj)->elem)
+#define rxe_get(obj) __rxe_get(&(obj)->elem)
 
-/* drop a reference on an object */
 int __rxe_put(struct rxe_pool_elem *elem);
 
-#define rxe_drop_ref(obj) __rxe_put(&(obj)->elem)
+#define rxe_put(obj) __rxe_put(&(obj)->elem)
 
-#define rxe_read_ref(obj) kref_read(&(obj)->elem.ref_cnt)
+#define rxe_read(obj) kref_read(&(obj)->elem.ref_cnt)
 
 #endif /* RXE_POOL_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 26d461a8d71c..62acf890af6c 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -323,11 +323,11 @@ int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd,
 	struct rxe_cq *scq = to_rcq(init->send_cq);
 	struct rxe_srq *srq = init->srq ? to_rsrq(init->srq) : NULL;
 
-	rxe_add_ref(pd);
-	rxe_add_ref(rcq);
-	rxe_add_ref(scq);
+	rxe_get(pd);
+	rxe_get(rcq);
+	rxe_get(scq);
 	if (srq)
-		rxe_add_ref(srq);
+		rxe_get(srq);
 
 	qp->pd			= pd;
 	qp->rcq			= rcq;
@@ -359,10 +359,10 @@ int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd,
 	qp->srq = NULL;
 
 	if (srq)
-		rxe_drop_ref(srq);
-	rxe_drop_ref(scq);
-	rxe_drop_ref(rcq);
-	rxe_drop_ref(pd);
+		rxe_put(srq);
+	rxe_put(scq);
+	rxe_put(rcq);
+	rxe_put(pd);
 
 	return err;
 }
@@ -521,7 +521,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
 	qp->resp.sent_psn_nak = 0;
 
 	if (qp->resp.mr) {
-		rxe_drop_ref(qp->resp.mr);
+		rxe_put(qp->resp.mr);
 		qp->resp.mr = NULL;
 	}
 
@@ -809,20 +809,20 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
 		rxe_queue_cleanup(qp->sq.queue);
 
 	if (qp->srq)
-		rxe_drop_ref(qp->srq);
+		rxe_put(qp->srq);
 
 	if (qp->rq.queue)
 		rxe_queue_cleanup(qp->rq.queue);
 
 	if (qp->scq)
-		rxe_drop_ref(qp->scq);
+		rxe_put(qp->scq);
 	if (qp->rcq)
-		rxe_drop_ref(qp->rcq);
+		rxe_put(qp->rcq);
 	if (qp->pd)
-		rxe_drop_ref(qp->pd);
+		rxe_put(qp->pd);
 
 	if (qp->resp.mr)
-		rxe_drop_ref(qp->resp.mr);
+		rxe_put(qp->resp.mr);
 
 	if (qp_type(qp) == IB_QPT_RC)
 		sk_dst_reset(qp->sk->sk);
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 53924453abef..d09a8b68c962 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -217,7 +217,7 @@ static int hdr_check(struct rxe_pkt_info *pkt)
 	return 0;
 
 err2:
-	rxe_drop_ref(qp);
+	rxe_put(qp);
 err1:
 	return -EINVAL;
 }
@@ -288,11 +288,11 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 
 			cpkt = SKB_TO_PKT(cskb);
 			cpkt->qp = qp;
-			rxe_add_ref(qp);
+			rxe_get(qp);
 			rxe_rcv_pkt(cpkt, cskb);
 		} else {
 			pkt->qp = qp;
-			rxe_add_ref(qp);
+			rxe_get(qp);
 			rxe_rcv_pkt(pkt, skb);
 			skb = NULL;	/* mark consumed */
 		}
@@ -397,7 +397,7 @@ void rxe_rcv(struct sk_buff *skb)
 
 drop:
 	if (pkt->qp)
-		rxe_drop_ref(pkt->qp);
+		rxe_put(pkt->qp);
 
 	kfree_skb(skb);
 	ib_device_put(&rxe->ib_dev);
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index f44535f82bea..2bde9e767dc7 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -611,7 +611,7 @@ int rxe_requester(void *arg)
 	struct rxe_ah *ah;
 	struct rxe_av *av;
 
-	rxe_add_ref(qp);
+	rxe_get(qp);
 
 next_wqe:
 	if (unlikely(!qp->valid || qp->req.state == QP_STATE_ERROR))
@@ -690,7 +690,7 @@ int rxe_requester(void *arg)
 			wqe->state = wqe_state_done;
 			wqe->status = IB_WC_SUCCESS;
 			__rxe_do_task(&qp->comp.task);
-			rxe_drop_ref(qp);
+			rxe_put(qp);
 			return 0;
 		}
 		payload = mtu;
@@ -729,7 +729,7 @@ int rxe_requester(void *arg)
 	}
 
 	if (ah)
-		rxe_drop_ref(ah);
+		rxe_put(ah);
 
 	/*
 	 * To prevent a race on wqe access between requester and completer,
@@ -761,12 +761,12 @@ int rxe_requester(void *arg)
 
 err_drop_ah:
 	if (ah)
-		rxe_drop_ref(ah);
+		rxe_put(ah);
 err:
 	wqe->state = wqe_state_error;
 	__rxe_do_task(&qp->comp.task);
 
 exit:
-	rxe_drop_ref(qp);
+	rxe_put(qp);
 	return -EAGAIN;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index b1ec003f0bb8..16fc7ea1298d 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -99,7 +99,7 @@ static inline enum resp_states get_req(struct rxe_qp *qp,
 
 	if (qp->resp.state == QP_STATE_ERROR) {
 		while ((skb = skb_dequeue(&qp->req_pkts))) {
-			rxe_drop_ref(qp);
+			rxe_put(qp);
 			kfree_skb(skb);
 			ib_device_put(qp->ibqp.device);
 		}
@@ -464,8 +464,8 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 		if (mw->access & IB_ZERO_BASED)
 			qp->resp.offset = mw->addr;
 
-		rxe_drop_ref(mw);
-		rxe_add_ref(mr);
+		rxe_put(mw);
+		rxe_get(mr);
 	} else {
 		mr = lookup_mr(qp->pd, access, rkey, RXE_LOOKUP_REMOTE);
 		if (!mr) {
@@ -508,9 +508,9 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 
 err:
 	if (mr)
-		rxe_drop_ref(mr);
+		rxe_put(mr);
 	if (mw)
-		rxe_drop_ref(mw);
+		rxe_put(mw);
 
 	return state;
 }
@@ -694,12 +694,12 @@ static struct rxe_mr *rxe_recheck_mr(struct rxe_qp *qp, u32 rkey)
 			return NULL;
 
 		if (mw->state != RXE_MW_STATE_VALID) {
-			rxe_drop_ref(mw);
+			rxe_put(mw);
 			return NULL;
 		}
 
 		mr = mw->mr;
-		rxe_drop_ref(mw);
+		rxe_put(mw);
 	} else {
 		mr = rxe_pool_get_index(&rxe->mr_pool, rkey >> 8);
 		if (!mr || mr->rkey != rkey)
@@ -707,7 +707,7 @@ static struct rxe_mr *rxe_recheck_mr(struct rxe_qp *qp, u32 rkey)
 	}
 
 	if (mr->state != RXE_MR_STATE_VALID) {
-		rxe_drop_ref(mr);
+		rxe_put(mr);
 		return NULL;
 	}
 
@@ -768,7 +768,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 	if (err)
 		pr_err("Failed copying memory\n");
 	if (mr)
-		rxe_drop_ref(mr);
+		rxe_put(mr);
 
 	if (bth_pad(&ack_pkt)) {
 		u8 *pad = payload_addr(&ack_pkt) + payload;
@@ -1037,7 +1037,7 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 	rc = rxe_xmit_packet(qp, &ack_pkt, skb);
 	if (rc) {
 		pr_err_ratelimited("Failed sending ack\n");
-		rxe_drop_ref(qp);
+		rxe_put(qp);
 	}
 out:
 	return rc;
@@ -1066,13 +1066,13 @@ static enum resp_states cleanup(struct rxe_qp *qp,
 
 	if (pkt) {
 		skb = skb_dequeue(&qp->req_pkts);
-		rxe_drop_ref(qp);
+		rxe_put(qp);
 		kfree_skb(skb);
 		ib_device_put(qp->ibqp.device);
 	}
 
 	if (qp->resp.mr) {
-		rxe_drop_ref(qp->resp.mr);
+		rxe_put(qp->resp.mr);
 		qp->resp.mr = NULL;
 	}
 
@@ -1216,7 +1216,7 @@ static enum resp_states do_class_d1e_error(struct rxe_qp *qp)
 		}
 
 		if (qp->resp.mr) {
-			rxe_drop_ref(qp->resp.mr);
+			rxe_put(qp->resp.mr);
 			qp->resp.mr = NULL;
 		}
 
@@ -1230,7 +1230,7 @@ static void rxe_drain_req_pkts(struct rxe_qp *qp, bool notify)
 	struct rxe_queue *q = qp->rq.queue;
 
 	while ((skb = skb_dequeue(&qp->req_pkts))) {
-		rxe_drop_ref(qp);
+		rxe_put(qp);
 		kfree_skb(skb);
 		ib_device_put(qp->ibqp.device);
 	}
@@ -1250,7 +1250,7 @@ int rxe_responder(void *arg)
 	struct rxe_pkt_info *pkt = NULL;
 	int ret = 0;
 
-	rxe_add_ref(qp);
+	rxe_get(qp);
 
 	qp->resp.aeth_syndrome = AETH_ACK_UNLIMITED;
 
@@ -1437,6 +1437,6 @@ int rxe_responder(void *arg)
 exit:
 	ret = -EAGAIN;
 done:
-	rxe_drop_ref(qp);
+	rxe_put(qp);
 	return ret;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index f0c5715ac500..67184b0281a0 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -115,7 +115,7 @@ static void rxe_dealloc_ucontext(struct ib_ucontext *ibuc)
 {
 	struct rxe_ucontext *uc = to_ruc(ibuc);
 
-	rxe_drop_ref(uc);
+	rxe_put(uc);
 }
 
 static int rxe_port_immutable(struct ib_device *dev, u32 port_num,
@@ -149,7 +149,7 @@ static int rxe_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
 {
 	struct rxe_pd *pd = to_rpd(ibpd);
 
-	rxe_drop_ref(pd);
+	rxe_put(pd);
 	return 0;
 }
 
@@ -188,7 +188,7 @@ static int rxe_create_ah(struct ib_ah *ibah,
 		err = copy_to_user(&uresp->ah_num, &ah->ah_num,
 					 sizeof(uresp->ah_num));
 		if (err) {
-			rxe_drop_ref(ah);
+			rxe_put(ah);
 			return -EFAULT;
 		}
 	} else if (ah->is_user) {
@@ -228,7 +228,7 @@ static int rxe_destroy_ah(struct ib_ah *ibah, u32 flags)
 {
 	struct rxe_ah *ah = to_rah(ibah);
 
-	rxe_drop_ref(ah);
+	rxe_put(ah);
 	return 0;
 }
 
@@ -303,7 +303,7 @@ static int rxe_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init,
 	if (err)
 		goto err1;
 
-	rxe_add_ref(pd);
+	rxe_get(pd);
 	srq->pd = pd;
 
 	err = rxe_srq_from_init(rxe, srq, init, udata, uresp);
@@ -313,8 +313,8 @@ static int rxe_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init,
 	return 0;
 
 err2:
-	rxe_drop_ref(pd);
-	rxe_drop_ref(srq);
+	rxe_put(pd);
+	rxe_put(srq);
 err1:
 	return err;
 }
@@ -371,8 +371,8 @@ static int rxe_destroy_srq(struct ib_srq *ibsrq, struct ib_udata *udata)
 	if (srq->rq.queue)
 		rxe_queue_cleanup(srq->rq.queue);
 
-	rxe_drop_ref(srq->pd);
-	rxe_drop_ref(srq);
+	rxe_put(srq->pd);
+	rxe_put(srq);
 	return 0;
 }
 
@@ -442,7 +442,7 @@ static int rxe_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init,
 	return 0;
 
 qp_init:
-	rxe_drop_ref(qp);
+	rxe_put(qp);
 	return err;
 }
 
@@ -496,7 +496,7 @@ static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 		return ret;
 
 	rxe_qp_destroy(qp);
-	rxe_drop_ref(qp);
+	rxe_put(qp);
 	return 0;
 }
 
@@ -809,7 +809,7 @@ static int rxe_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 
 	rxe_cq_disable(cq);
 
-	rxe_drop_ref(cq);
+	rxe_put(cq);
 	return 0;
 }
 
@@ -902,7 +902,7 @@ static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access)
 	if (!mr)
 		return ERR_PTR(-ENOMEM);
 
-	rxe_add_ref(pd);
+	rxe_get(pd);
 	rxe_mr_init_dma(pd, access, mr);
 
 	return &mr->ibmr;
@@ -926,7 +926,7 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 	}
 
 
-	rxe_add_ref(pd);
+	rxe_get(pd);
 
 	err = rxe_mr_init_user(pd, start, length, iova, access, mr);
 	if (err)
@@ -935,8 +935,8 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 	return &mr->ibmr;
 
 err3:
-	rxe_drop_ref(pd);
-	rxe_drop_ref(mr);
+	rxe_put(pd);
+	rxe_put(mr);
 err2:
 	return ERR_PTR(err);
 }
@@ -958,7 +958,7 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
 		goto err1;
 	}
 
-	rxe_add_ref(pd);
+	rxe_get(pd);
 
 	err = rxe_mr_init_fast(pd, max_num_sg, mr);
 	if (err)
@@ -967,8 +967,8 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
 	return &mr->ibmr;
 
 err2:
-	rxe_drop_ref(pd);
-	rxe_drop_ref(mr);
+	rxe_put(pd);
+	rxe_put(mr);
 err1:
 	return ERR_PTR(err);
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH for-next v11 10/13] RDMA/rxe: Stop lookup of partially built objects
  2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
                   ` (8 preceding siblings ...)
  2022-03-04  0:08 ` [PATCH for-next v11 09/13] RDMA/rxe: Use standard names for ref counting Bob Pearson
@ 2022-03-04  0:08 ` Bob Pearson
  2022-03-16  0:16   ` Jason Gunthorpe
  2022-03-04  0:08 ` [PATCH for-next v11 11/13] RDMA/rxe: Add wait_for_completion to pool objects Bob Pearson
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Bob Pearson @ 2022-03-04  0:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently the rdma_rxe driver has a security weakness due to giving
objects which are partially initialized indices allowing external
actors to gain access to them by sending packets which refer to
their index (e.g. qpn, rkey, etc) causing unpredictable results.

This patch adds two new APIs rxe_show(obj) and rxe_hide(obj) which
enable or disable looking up pool objects from indices using
rxe_pool_get_index(). By default objects are disabled. These APIs
are used to enable looking up objects which have indices:
AH, SRQ, QP, MR, and MW. They are added in create verbs after the
objects are fully initialized and as soon as possible in destroy
verbs.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mr.c    |  2 ++
 drivers/infiniband/sw/rxe/rxe_mw.c    |  4 +++
 drivers/infiniband/sw/rxe/rxe_pool.c  | 40 +++++++++++++++++++++++++--
 drivers/infiniband/sw/rxe/rxe_pool.h  |  5 ++++
 drivers/infiniband/sw/rxe/rxe_verbs.c | 12 ++++++++
 5 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 60a31b718774..e4ad2cc12584 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -689,6 +689,8 @@ int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 		return -EINVAL;
 	}
 
+	rxe_hide(mr);
+
 	mr->state = RXE_MR_STATE_INVALID;
 	rxe_put(mr_pd(mr));
 	rxe_put(mr);
diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c
index c86b2efd58f2..4edbed1410e9 100644
--- a/drivers/infiniband/sw/rxe/rxe_mw.c
+++ b/drivers/infiniband/sw/rxe/rxe_mw.c
@@ -25,6 +25,8 @@ int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata)
 			RXE_MW_STATE_FREE : RXE_MW_STATE_VALID;
 	spin_lock_init(&mw->lock);
 
+	rxe_show(mw);
+
 	return 0;
 }
 
@@ -56,6 +58,8 @@ int rxe_dealloc_mw(struct ib_mw *ibmw)
 	struct rxe_mw *mw = to_rmw(ibmw);
 	struct rxe_pd *pd = to_rpd(ibmw->pd);
 
+	rxe_hide(mw);
+
 	spin_lock_bh(&mw->lock);
 	rxe_do_dealloc_mw(mw);
 	spin_unlock_bh(&mw->lock);
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index b0dfeb08a470..c0b70687a83e 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -159,7 +159,7 @@ void *rxe_alloc(struct rxe_pool *pool)
 	elem->obj = obj;
 	kref_init(&elem->ref_cnt);
 
-	err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
+	err = xa_alloc_cyclic(&pool->xa, &elem->index, NULL, pool->limit,
 			      &pool->next, GFP_KERNEL);
 	if (err)
 		goto err_free;
@@ -187,7 +187,7 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 	elem->obj = (u8 *)elem - pool->elem_offset;
 	kref_init(&elem->ref_cnt);
 
-	err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
+	err = xa_alloc_cyclic(&pool->xa, &elem->index, NULL, pool->limit,
 			      &pool->next, GFP_KERNEL);
 	if (err)
 		goto err_cnt;
@@ -221,8 +221,12 @@ static void rxe_elem_release(struct kref *kref)
 {
 	struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt);
 	struct rxe_pool *pool = elem->pool;
+	struct xarray *xa = &pool->xa;
+	unsigned long flags;
 
-	xa_erase(&pool->xa, elem->index);
+	xa_lock_irqsave(xa, flags);
+	__xa_erase(&pool->xa, elem->index);
+	xa_unlock_irqrestore(xa, flags);
 
 	if (pool->cleanup)
 		pool->cleanup(elem);
@@ -242,3 +246,33 @@ int __rxe_put(struct rxe_pool_elem *elem)
 {
 	return kref_put(&elem->ref_cnt, rxe_elem_release);
 }
+
+int __rxe_show(struct rxe_pool_elem *elem)
+{
+	struct xarray *xa = &elem->pool->xa;
+	unsigned long flags;
+	void *ret;
+
+	xa_lock_irqsave(xa, flags);
+	ret = __xa_store(&elem->pool->xa, elem->index, elem, GFP_KERNEL);
+	xa_unlock_irqrestore(xa, flags);
+	if (IS_ERR(ret))
+		return PTR_ERR(ret);
+	else
+		return 0;
+}
+
+int __rxe_hide(struct rxe_pool_elem *elem)
+{
+	struct xarray *xa = &elem->pool->xa;
+	unsigned long flags;
+	void *ret;
+
+	xa_lock_irqsave(xa, flags);
+	ret = __xa_store(&elem->pool->xa, elem->index, NULL, GFP_KERNEL);
+	xa_unlock_irqrestore(xa, flags);
+	if (IS_ERR(ret))
+		return PTR_ERR(ret);
+	else
+		return 0;
+}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 24bcc786c1b3..c48d8f6f95f3 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -79,4 +79,9 @@ int __rxe_put(struct rxe_pool_elem *elem);
 
 #define rxe_read(obj) kref_read(&(obj)->elem.ref_cnt)
 
+int __rxe_show(struct rxe_pool_elem *elem);
+#define rxe_show(obj) __rxe_show(&(obj)->elem)
+int __rxe_hide(struct rxe_pool_elem *elem);
+#define rxe_hide(obj) __rxe_hide(&(obj)->elem)
+
 #endif /* RXE_POOL_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 67184b0281a0..e010e8860492 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -197,6 +197,8 @@ static int rxe_create_ah(struct ib_ah *ibah,
 	}
 
 	rxe_init_av(init_attr->ah_attr, &ah->av);
+	rxe_show(ah);
+
 	return 0;
 }
 
@@ -228,6 +230,7 @@ static int rxe_destroy_ah(struct ib_ah *ibah, u32 flags)
 {
 	struct rxe_ah *ah = to_rah(ibah);
 
+	rxe_hide(ah);
 	rxe_put(ah);
 	return 0;
 }
@@ -310,6 +313,7 @@ static int rxe_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init,
 	if (err)
 		goto err2;
 
+	rxe_show(srq);
 	return 0;
 
 err2:
@@ -368,6 +372,7 @@ static int rxe_destroy_srq(struct ib_srq *ibsrq, struct ib_udata *udata)
 {
 	struct rxe_srq *srq = to_rsrq(ibsrq);
 
+	rxe_hide(srq);
 	if (srq->rq.queue)
 		rxe_queue_cleanup(srq->rq.queue);
 
@@ -439,6 +444,7 @@ static int rxe_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init,
 	if (err)
 		goto qp_init;
 
+	rxe_show(qp);
 	return 0;
 
 qp_init:
@@ -491,6 +497,7 @@ static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 	struct rxe_qp *qp = to_rqp(ibqp);
 	int ret;
 
+	rxe_hide(qp);
 	ret = rxe_qp_chk_destroy(qp);
 	if (ret)
 		return ret;
@@ -904,6 +911,7 @@ static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access)
 
 	rxe_get(pd);
 	rxe_mr_init_dma(pd, access, mr);
+	rxe_show(mr);
 
 	return &mr->ibmr;
 }
@@ -932,6 +940,8 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 	if (err)
 		goto err3;
 
+	rxe_show(mr);
+
 	return &mr->ibmr;
 
 err3:
@@ -964,6 +974,8 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
 	if (err)
 		goto err2;
 
+	rxe_show(mr);
+
 	return &mr->ibmr;
 
 err2:
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH for-next v11 11/13] RDMA/rxe: Add wait_for_completion to pool objects
  2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
                   ` (9 preceding siblings ...)
  2022-03-04  0:08 ` [PATCH for-next v11 10/13] RDMA/rxe: Stop lookup of partially built objects Bob Pearson
@ 2022-03-04  0:08 ` Bob Pearson
  2022-03-16  0:17   ` Jason Gunthorpe
  2022-03-04  0:08 ` [PATCH for-next v11 12/13] RDMA/rxe: Convert read side locking to rcu Bob Pearson
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Bob Pearson @ 2022-03-04  0:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Reference counting for object deletion can cause an object to
wait for something else to happen before an object gets deleted.
The destroy verbs can then return to rdma-core with the object still
holding references. Adding wait_for_completion in this path
prevents this.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_mr.c    |  2 +-
 drivers/infiniband/sw/rxe/rxe_mw.c    |  2 +-
 drivers/infiniband/sw/rxe/rxe_pool.c  | 29 +++++++++++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_pool.h  |  5 +++++
 drivers/infiniband/sw/rxe/rxe_verbs.c | 22 ++++++++++----------
 5 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index e4ad2cc12584..83e370ae9df6 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -693,7 +693,7 @@ int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 
 	mr->state = RXE_MR_STATE_INVALID;
 	rxe_put(mr_pd(mr));
-	rxe_put(mr);
+	rxe_wait(mr);
 
 	return 0;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c
index 4edbed1410e9..157b3f968522 100644
--- a/drivers/infiniband/sw/rxe/rxe_mw.c
+++ b/drivers/infiniband/sw/rxe/rxe_mw.c
@@ -64,8 +64,8 @@ int rxe_dealloc_mw(struct ib_mw *ibmw)
 	rxe_do_dealloc_mw(mw);
 	spin_unlock_bh(&mw->lock);
 
-	rxe_put(mw);
 	rxe_put(pd);
+	rxe_wait(mw);
 
 	return 0;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index c0b70687a83e..4fb6c7dd32ad 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -6,6 +6,8 @@
 
 #include "rxe.h"
 
+#define RXE_POOL_TIMEOUT	(200)
+#define RXE_POOL_MAX_TIMEOUTS	(3)
 #define RXE_POOL_ALIGN		(16)
 
 static const struct rxe_type_info {
@@ -158,6 +160,7 @@ void *rxe_alloc(struct rxe_pool *pool)
 	elem->pool = pool;
 	elem->obj = obj;
 	kref_init(&elem->ref_cnt);
+	init_completion(&elem->complete);
 
 	err = xa_alloc_cyclic(&pool->xa, &elem->index, NULL, pool->limit,
 			      &pool->next, GFP_KERNEL);
@@ -186,6 +189,7 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 	elem->pool = pool;
 	elem->obj = (u8 *)elem - pool->elem_offset;
 	kref_init(&elem->ref_cnt);
+	init_completion(&elem->complete);
 
 	err = xa_alloc_cyclic(&pool->xa, &elem->index, NULL, pool->limit,
 			      &pool->next, GFP_KERNEL);
@@ -228,6 +232,29 @@ static void rxe_elem_release(struct kref *kref)
 	__xa_erase(&pool->xa, elem->index);
 	xa_unlock_irqrestore(xa, flags);
 
+	complete(&elem->complete);
+}
+
+int __rxe_wait(struct rxe_pool_elem *elem)
+{
+	struct rxe_pool *pool = elem->pool;
+	static int timeout = RXE_POOL_TIMEOUT;
+	int ret, err = 0;
+
+	__rxe_put(elem);
+
+	if (timeout) {
+		ret = wait_for_completion_timeout(&elem->complete, timeout);
+		if (!ret) {
+			pr_warn("Timed out waiting for %s#%d to complete\n",
+				pool->name, elem->index);
+			if (++pool->timeouts >= RXE_POOL_MAX_TIMEOUTS)
+				timeout = 0;
+
+			err = -EINVAL;
+		}
+	}
+
 	if (pool->cleanup)
 		pool->cleanup(elem);
 
@@ -235,6 +262,8 @@ static void rxe_elem_release(struct kref *kref)
 		kfree(elem->obj);
 
 	atomic_dec(&pool->num_elem);
+
+	return err;
 }
 
 int __rxe_get(struct rxe_pool_elem *elem)
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index c48d8f6f95f3..1863fa165450 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -28,6 +28,7 @@ struct rxe_pool_elem {
 	void			*obj;
 	struct kref		ref_cnt;
 	struct list_head	list;
+	struct completion	complete;
 	u32			index;
 };
 
@@ -37,6 +38,7 @@ struct rxe_pool {
 	void			(*cleanup)(struct rxe_pool_elem *elem);
 	enum rxe_pool_flags	flags;
 	enum rxe_elem_type	type;
+	unsigned int		timeouts;
 
 	unsigned int		max_elem;
 	atomic_t		num_elem;
@@ -77,6 +79,9 @@ int __rxe_put(struct rxe_pool_elem *elem);
 
 #define rxe_put(obj) __rxe_put(&(obj)->elem)
 
+int __rxe_wait(struct rxe_pool_elem *elem);
+#define rxe_wait(obj) __rxe_wait(&(obj)->elem)
+
 #define rxe_read(obj) kref_read(&(obj)->elem.ref_cnt)
 
 int __rxe_show(struct rxe_pool_elem *elem);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index e010e8860492..0529ad8e819b 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -115,7 +115,7 @@ static void rxe_dealloc_ucontext(struct ib_ucontext *ibuc)
 {
 	struct rxe_ucontext *uc = to_ruc(ibuc);
 
-	rxe_put(uc);
+	rxe_wait(uc);
 }
 
 static int rxe_port_immutable(struct ib_device *dev, u32 port_num,
@@ -149,7 +149,7 @@ static int rxe_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
 {
 	struct rxe_pd *pd = to_rpd(ibpd);
 
-	rxe_put(pd);
+	rxe_wait(pd);
 	return 0;
 }
 
@@ -188,7 +188,7 @@ static int rxe_create_ah(struct ib_ah *ibah,
 		err = copy_to_user(&uresp->ah_num, &ah->ah_num,
 					 sizeof(uresp->ah_num));
 		if (err) {
-			rxe_put(ah);
+			rxe_wait(ah);
 			return -EFAULT;
 		}
 	} else if (ah->is_user) {
@@ -231,7 +231,7 @@ static int rxe_destroy_ah(struct ib_ah *ibah, u32 flags)
 	struct rxe_ah *ah = to_rah(ibah);
 
 	rxe_hide(ah);
-	rxe_put(ah);
+	rxe_wait(ah);
 	return 0;
 }
 
@@ -318,7 +318,7 @@ static int rxe_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init,
 
 err2:
 	rxe_put(pd);
-	rxe_put(srq);
+	rxe_wait(srq);
 err1:
 	return err;
 }
@@ -377,7 +377,7 @@ static int rxe_destroy_srq(struct ib_srq *ibsrq, struct ib_udata *udata)
 		rxe_queue_cleanup(srq->rq.queue);
 
 	rxe_put(srq->pd);
-	rxe_put(srq);
+	rxe_wait(srq);
 	return 0;
 }
 
@@ -448,7 +448,7 @@ static int rxe_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init,
 	return 0;
 
 qp_init:
-	rxe_put(qp);
+	rxe_wait(qp);
 	return err;
 }
 
@@ -503,7 +503,7 @@ static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 		return ret;
 
 	rxe_qp_destroy(qp);
-	rxe_put(qp);
+	rxe_wait(qp);
 	return 0;
 }
 
@@ -816,7 +816,7 @@ static int rxe_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 
 	rxe_cq_disable(cq);
 
-	rxe_put(cq);
+	rxe_wait(cq);
 	return 0;
 }
 
@@ -946,7 +946,7 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 
 err3:
 	rxe_put(pd);
-	rxe_put(mr);
+	rxe_wait(mr);
 err2:
 	return ERR_PTR(err);
 }
@@ -980,7 +980,7 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
 
 err2:
 	rxe_put(pd);
-	rxe_put(mr);
+	rxe_wait(mr);
 err1:
 	return ERR_PTR(err);
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH for-next v11 12/13] RDMA/rxe: Convert read side locking to rcu
  2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
                   ` (10 preceding siblings ...)
  2022-03-04  0:08 ` [PATCH for-next v11 11/13] RDMA/rxe: Add wait_for_completion to pool objects Bob Pearson
@ 2022-03-04  0:08 ` Bob Pearson
  2022-03-16  0:18   ` Jason Gunthorpe
  2022-03-04  0:08 ` [PATCH for-next v11 13/13] RDMA/rxe: Cleanup rxe_pool.c Bob Pearson
  2022-03-16  0:25 ` [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Jason Gunthorpe
  13 siblings, 1 reply; 28+ messages in thread
From: Bob Pearson @ 2022-03-04  0:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Use rcu_read_lock() for protecting read side operations in rxe_pool.c.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 4fb6c7dd32ad..ec464b03d120 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -207,16 +207,15 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 {
 	struct rxe_pool_elem *elem;
 	struct xarray *xa = &pool->xa;
-	unsigned long flags;
 	void *obj;
 
-	xa_lock_irqsave(xa, flags);
+	rcu_read_lock();
 	elem = xa_load(xa, index);
 	if (elem && kref_get_unless_zero(&elem->ref_cnt))
 		obj = elem->obj;
 	else
 		obj = NULL;
-	xa_unlock_irqrestore(xa, flags);
+	rcu_read_unlock();
 
 	return obj;
 }
@@ -259,7 +258,7 @@ int __rxe_wait(struct rxe_pool_elem *elem)
 		pool->cleanup(elem);
 
 	if (pool->flags & RXE_POOL_ALLOC)
-		kfree(elem->obj);
+		kfree_rcu(elem->obj);
 
 	atomic_dec(&pool->num_elem);
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH for-next v11 13/13] RDMA/rxe: Cleanup rxe_pool.c
  2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
                   ` (11 preceding siblings ...)
  2022-03-04  0:08 ` [PATCH for-next v11 12/13] RDMA/rxe: Convert read side locking to rcu Bob Pearson
@ 2022-03-04  0:08 ` Bob Pearson
  2022-03-16  0:25 ` [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Jason Gunthorpe
  13 siblings, 0 replies; 28+ messages in thread
From: Bob Pearson @ 2022-03-04  0:08 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Minor cleanup of rxe_pool.c. Add document comment headers for
the subroutines. Increase alignment for pool elements.
Convert some printk's to WARN-ON's.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c  | 129 +++++++++++++++++++++-----
 drivers/infiniband/sw/rxe/rxe_verbs.c |  27 ++----
 2 files changed, 115 insertions(+), 41 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index ec464b03d120..d5cd0e71e9a0 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -1,14 +1,14 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 /*
+ * Copyright (c) 2022 Hewlett Packard Enterprise, Inc. All rights reserved.
  * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
  * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
  */
 
 #include "rxe.h"
 
-#define RXE_POOL_TIMEOUT	(200)
-#define RXE_POOL_MAX_TIMEOUTS	(3)
-#define RXE_POOL_ALIGN		(16)
+#define RXE_POOL_TIMEOUT	(200)	/* jiffies */
+#define RXE_POOL_ALIGN		(64)
 
 static const struct rxe_type_info {
 	const char *name;
@@ -90,6 +90,14 @@ static const struct rxe_type_info {
 	},
 };
 
+/**
+ * rxe_pool_init - initialize a rxe object pool
+ * @rxe: rxe device pool belongs to
+ * @pool: object pool
+ * @type: pool type
+ *
+ * Called from rxe_init()
+ */
 void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
 		   enum rxe_elem_type type)
 {
@@ -113,6 +121,12 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
 	pool->limit.max = info->max_index;
 }
 
+/**
+ * rxe_pool_cleanup - free any remaining pool resources
+ * @pool: object pool
+ *
+ * Called from rxe_dealloc()
+ */
 void rxe_pool_cleanup(struct rxe_pool *pool)
 {
 	struct rxe_pool_elem *elem;
@@ -136,24 +150,37 @@ void rxe_pool_cleanup(struct rxe_pool *pool)
 
 	if (WARN_ON(elem_count || obj_count))
 		pr_debug("Freed %d indices and %d objects from pool %s\n",
-			elem_count, obj_count, pool->name);
+			 elem_count, obj_count, pool->name);
 }
 
+/**
+ * rxe_alloc - allocate a new pool object
+ * @pool: object pool
+ *
+ * Context: in task.
+ * Returns: object on success else an ERR_PTR
+ */
 void *rxe_alloc(struct rxe_pool *pool)
 {
+	struct xarray *xa = &pool->xa;
 	struct rxe_pool_elem *elem;
 	void *obj;
-	int err;
+	int err = -EINVAL;
 
 	if (WARN_ON(!(pool->flags & RXE_POOL_ALLOC)))
-		return NULL;
+		goto err_out;
+
+	if (WARN_ON(!in_task()))
+		goto err_dec;
 
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
-		goto err_cnt;
+		goto err_dec;
 
 	obj = kzalloc(pool->elem_size, GFP_KERNEL);
-	if (!obj)
-		goto err_cnt;
+	if (!obj) {
+		err = -ENOMEM;
+		goto err_dec;
+	}
 
 	elem = (struct rxe_pool_elem *)((u8 *)obj + pool->elem_offset);
 
@@ -162,7 +189,7 @@ void *rxe_alloc(struct rxe_pool *pool)
 	kref_init(&elem->ref_cnt);
 	init_completion(&elem->complete);
 
-	err = xa_alloc_cyclic(&pool->xa, &elem->index, NULL, pool->limit,
+	err = xa_alloc_cyclic(xa, &elem->index, NULL, pool->limit,
 			      &pool->next, GFP_KERNEL);
 	if (err)
 		goto err_free;
@@ -171,38 +198,59 @@ void *rxe_alloc(struct rxe_pool *pool)
 
 err_free:
 	kfree(obj);
-err_cnt:
+err_dec:
 	atomic_dec(&pool->num_elem);
-	return NULL;
+err_out:
+	return ERR_PTR(err);
 }
 
+/**
+ * __rxe_add_to_pool - add rdma-core allocated object to rxe object pool
+ * @pool: object pool
+ * @elem: rxe_pool_elem embedded in object
+ *
+ * Context: in task.
+ * Returns: 0 on success else an error
+ */
 int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 {
-	int err;
+	struct xarray *xa = &pool->xa;
+	int err = -EINVAL;
 
 	if (WARN_ON(pool->flags & RXE_POOL_ALLOC))
-		return -EINVAL;
+		goto err_out;
+
+	if (WARN_ON(!in_task()))
+		goto err_out;
 
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
-		goto err_cnt;
+		goto err_dec;
 
 	elem->pool = pool;
 	elem->obj = (u8 *)elem - pool->elem_offset;
 	kref_init(&elem->ref_cnt);
 	init_completion(&elem->complete);
 
-	err = xa_alloc_cyclic(&pool->xa, &elem->index, NULL, pool->limit,
+	err = xa_alloc_cyclic(xa, &elem->index, NULL, pool->limit,
 			      &pool->next, GFP_KERNEL);
 	if (err)
-		goto err_cnt;
+		goto err_dec;
 
 	return 0;
 
-err_cnt:
+err_dec:
 	atomic_dec(&pool->num_elem);
-	return -EINVAL;
+err_out:
+	return err;
 }
 
+/**
+ * rxe_pool_get_index - find object in pool with given index
+ * @pool: object pool
+ * @index: index
+ *
+ * Returns: object on success else NULL
+ */
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 {
 	struct rxe_pool_elem *elem;
@@ -220,6 +268,12 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 	return obj;
 }
 
+/**
+ * rxe_elem_release - remove object index and complete
+ * @kref: kref embedded in pool element
+ *
+ * Context: ref count of pool object has reached zero.
+ */
 static void rxe_elem_release(struct kref *kref)
 {
 	struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt);
@@ -234,6 +288,12 @@ static void rxe_elem_release(struct kref *kref)
 	complete(&elem->complete);
 }
 
+/**
+ * __rxe_wait - put a ref on object and wait for completion
+ * @elem: rxe_pool_elem embedded in object
+ *
+ * Returns: 0 if object did not timeout else an error
+ */
 int __rxe_wait(struct rxe_pool_elem *elem)
 {
 	struct rxe_pool *pool = elem->pool;
@@ -244,12 +304,9 @@ int __rxe_wait(struct rxe_pool_elem *elem)
 
 	if (timeout) {
 		ret = wait_for_completion_timeout(&elem->complete, timeout);
-		if (!ret) {
-			pr_warn("Timed out waiting for %s#%d to complete\n",
+		if (WARN_ON(!ret)) {
+			pr_debug("Timed out waiting for %s#%d to complete\n",
 				pool->name, elem->index);
-			if (++pool->timeouts >= RXE_POOL_MAX_TIMEOUTS)
-				timeout = 0;
-
 			err = -EINVAL;
 		}
 	}
@@ -265,16 +322,34 @@ int __rxe_wait(struct rxe_pool_elem *elem)
 	return err;
 }
 
+/**
+ * __rxe_add_ref - takes a ref on the object unless ref count is zero
+ * @elem: rxe_pool_elem embedded in object
+ *
+ * Returns: 1 if reference is added else 0
+ */
 int __rxe_get(struct rxe_pool_elem *elem)
 {
 	return kref_get_unless_zero(&elem->ref_cnt);
 }
 
+/**
+ * __rxe_drop_ref - puts a ref on the object
+ * @elem: rxe_pool_elem embedded in object
+ *
+ * Returns: 1 if ref count reaches zero and release called else 0
+ */
 int __rxe_put(struct rxe_pool_elem *elem)
 {
 	return kref_put(&elem->ref_cnt, rxe_elem_release);
 }
 
+/**
+ * __rxe_show - enable looking up object from index
+ * @elem: rxe_pool_elem embedded in object
+ *
+ * Returns 0 on success else an error
+ */
 int __rxe_show(struct rxe_pool_elem *elem)
 {
 	struct xarray *xa = &elem->pool->xa;
@@ -290,6 +365,12 @@ int __rxe_show(struct rxe_pool_elem *elem)
 		return 0;
 }
 
+/**
+ * __rxe_hide - disable looking up object from index
+ * @elem: rxe_pool_elem embedded in object
+ *
+ * Returns 0 on success else an error
+ */
 int __rxe_hide(struct rxe_pool_elem *elem)
 {
 	struct xarray *xa = &elem->pool->xa;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 0529ad8e819b..73f549efe632 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -906,8 +906,8 @@ static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access)
 	struct rxe_mr *mr;
 
 	mr = rxe_alloc(&rxe->mr_pool);
-	if (!mr)
-		return ERR_PTR(-ENOMEM);
+	if (IS_ERR(mr))
+		return (void *)mr;
 
 	rxe_get(pd);
 	rxe_mr_init_dma(pd, access, mr);
@@ -928,26 +928,22 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 	struct rxe_mr *mr;
 
 	mr = rxe_alloc(&rxe->mr_pool);
-	if (!mr) {
-		err = -ENOMEM;
-		goto err2;
-	}
-
+	if (IS_ERR(mr))
+		return (void *)mr;
 
 	rxe_get(pd);
 
 	err = rxe_mr_init_user(pd, start, length, iova, access, mr);
 	if (err)
-		goto err3;
+		goto err;
 
 	rxe_show(mr);
 
 	return &mr->ibmr;
 
-err3:
+err:
 	rxe_put(pd);
 	rxe_wait(mr);
-err2:
 	return ERR_PTR(err);
 }
 
@@ -963,25 +959,22 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
 		return ERR_PTR(-EINVAL);
 
 	mr = rxe_alloc(&rxe->mr_pool);
-	if (!mr) {
-		err = -ENOMEM;
-		goto err1;
-	}
+	if (IS_ERR(mr))
+		return (void *)mr;
 
 	rxe_get(pd);
 
 	err = rxe_mr_init_fast(pd, max_num_sg, mr);
 	if (err)
-		goto err2;
+		goto err;
 
 	rxe_show(mr);
 
 	return &mr->ibmr;
 
-err2:
+err:
 	rxe_put(pd);
 	rxe_wait(mr);
-err1:
 	return ERR_PTR(err);
 }
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next v11 08/13] RDMA/rxe: Replace red-black trees by xarrays
  2022-03-04  0:08 ` [PATCH for-next v11 08/13] RDMA/rxe: Replace red-black trees by xarrays Bob Pearson
@ 2022-03-15 23:45   ` Jason Gunthorpe
  2022-03-16  3:05     ` Bob Pearson
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2022-03-15 23:45 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Thu, Mar 03, 2022 at 06:08:04PM -0600, Bob Pearson wrote:
>  void rxe_pool_cleanup(struct rxe_pool *pool)
>  {
>  	struct rxe_pool_elem *elem;
> +	struct xarray *xa = &pool->xa;
> +	unsigned long index = 0;
> +	unsigned long max = ULONG_MAX;
> +	unsigned int elem_count = 0;
> +	unsigned int obj_count = 0;
> +
> +	do {
> +		elem = xa_find(xa, &index, max, XA_PRESENT);
> +		if (elem) {
> +			elem_count++;
> +			xa_erase(xa, index);
> +			if (pool->flags & RXE_POOL_ALLOC) {
> +				kfree(elem->obj);
> +				obj_count++;
> +			}
>  		}
> +	} while (elem);
>  
> +	if (WARN_ON(elem_count || obj_count))
> +		pr_debug("Freed %d indices and %d objects from pool %s\n",
> +			elem_count, obj_count, pool->name);

Can this just be 

WARN_ON(!xa_empty(xa));

?

Freeing memory that is still in use upgrades a resource leak to a UAF
security bug, so that is usually not good.

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next v11 10/13] RDMA/rxe: Stop lookup of partially built objects
  2022-03-04  0:08 ` [PATCH for-next v11 10/13] RDMA/rxe: Stop lookup of partially built objects Bob Pearson
@ 2022-03-16  0:16   ` Jason Gunthorpe
  2022-03-16  3:55     ` Bob Pearson
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2022-03-16  0:16 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Thu, Mar 03, 2022 at 06:08:06PM -0600, Bob Pearson wrote:
> Currently the rdma_rxe driver has a security weakness due to giving
> objects which are partially initialized indices allowing external
> actors to gain access to them by sending packets which refer to
> their index (e.g. qpn, rkey, etc) causing unpredictable results.
> 
> This patch adds two new APIs rxe_show(obj) and rxe_hide(obj) which
> enable or disable looking up pool objects from indices using
> rxe_pool_get_index(). By default objects are disabled. These APIs
> are used to enable looking up objects which have indices:
> AH, SRQ, QP, MR, and MW. They are added in create verbs after the
> objects are fully initialized and as soon as possible in destroy
> verbs.

In other parts of rdma we use the word 'finalize' where you used show

So rxe_pool_finalize() or something

I'm not clear on what hide is supposed to be for, if the object is
being destroyed why do we need a period when it is NULL in the xarray
before just erasing it?

> @@ -221,8 +221,12 @@ static void rxe_elem_release(struct kref *kref)
>  {
>  	struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt);
>  	struct rxe_pool *pool = elem->pool;
> +	struct xarray *xa = &pool->xa;
> +	unsigned long flags;
>  
> -	xa_erase(&pool->xa, elem->index);
> +	xa_lock_irqsave(xa, flags);
> +	__xa_erase(&pool->xa, elem->index);
> +	xa_unlock_irqrestore(xa, flags);

I guess it has to do with this, but why have the xa_erase in the kref
release at all?

>  	if (pool->cleanup)
>  		pool->cleanup(elem);
> @@ -242,3 +246,33 @@ int __rxe_put(struct rxe_pool_elem *elem)
>  {
>  	return kref_put(&elem->ref_cnt, rxe_elem_release);
>  }
> +
> +int __rxe_show(struct rxe_pool_elem *elem)
> +{
> +	struct xarray *xa = &elem->pool->xa;
> +	unsigned long flags;
> +	void *ret;
> +
> +	xa_lock_irqsave(xa, flags);
> +	ret = __xa_store(&elem->pool->xa, elem->index, elem, GFP_KERNEL);
> +	xa_unlock_irqrestore(xa, flags);
> +	if (IS_ERR(ret))
> +		return PTR_ERR(ret);

This can't fail due to the xa memory already being allocated. You can
just WARN_ON here and 'finalize' should not return an error code.

If you want to be fancy this checks for other mistakes too:

   tmp = xa_cmpxchg((&elem->pool->xa, elem->index, XA_ZERO_ENTRY,  elem, 0)
   WARN_ON(tmp != NULL);

> +int __rxe_hide(struct rxe_pool_elem *elem)
> +{
> +	struct xarray *xa = &elem->pool->xa;
> +	unsigned long flags;
> +	void *ret;
> +
> +	xa_lock_irqsave(xa, flags);
> +	ret = __xa_store(&elem->pool->xa, elem->index, NULL, GFP_KERNEL);
> +	xa_unlock_irqrestore(xa, flags);

IIRC storing NULL is the same as erase, isn't it?  You have to store
XA_ZERO_ENTRY if you want to keep an allocated NULL

> +	if (IS_ERR(ret))
> +		return PTR_ERR(ret);
> +	else
> +		return 0;
> +}

Same remark about the error handling

> @@ -491,6 +497,7 @@ static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
>  	struct rxe_qp *qp = to_rqp(ibqp);
>  	int ret;
>  
> +	rxe_hide(qp);
>  	ret = rxe_qp_chk_destroy(qp);
>  	if (ret)
>  		return ret;

So we decided not to destroy the QP but wrecked it in the xarray?

Not convinced about the hide at all..

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next v11 11/13] RDMA/rxe: Add wait_for_completion to pool objects
  2022-03-04  0:08 ` [PATCH for-next v11 11/13] RDMA/rxe: Add wait_for_completion to pool objects Bob Pearson
@ 2022-03-16  0:17   ` Jason Gunthorpe
  2022-03-16  3:57     ` Bob Pearson
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2022-03-16  0:17 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Thu, Mar 03, 2022 at 06:08:07PM -0600, Bob Pearson wrote:
> Reference counting for object deletion can cause an object to
> wait for something else to happen before an object gets deleted.
> The destroy verbs can then return to rdma-core with the object still
> holding references. Adding wait_for_completion in this path
> prevents this.

Maybe call this rxe_pool_destroy() or something instead of wait

wait doesn't really convay to the reader tha the memory is free after
it returns

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next v11 12/13] RDMA/rxe: Convert read side locking to rcu
  2022-03-04  0:08 ` [PATCH for-next v11 12/13] RDMA/rxe: Convert read side locking to rcu Bob Pearson
@ 2022-03-16  0:18   ` Jason Gunthorpe
  2022-03-16  4:05     ` Bob Pearson
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2022-03-16  0:18 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Thu, Mar 03, 2022 at 06:08:08PM -0600, Bob Pearson wrote:
> Use rcu_read_lock() for protecting read side operations in rxe_pool.c.
> 
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>  drivers/infiniband/sw/rxe/rxe_pool.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> index 4fb6c7dd32ad..ec464b03d120 100644
> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> @@ -207,16 +207,15 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
>  {
>  	struct rxe_pool_elem *elem;
>  	struct xarray *xa = &pool->xa;
> -	unsigned long flags;
>  	void *obj;
>  
> -	xa_lock_irqsave(xa, flags);
> +	rcu_read_lock();
>  	elem = xa_load(xa, index);
>  	if (elem && kref_get_unless_zero(&elem->ref_cnt))
>  		obj = elem->obj;
>  	else
>  		obj = NULL;
> -	xa_unlock_irqrestore(xa, flags);
> +	rcu_read_unlock();

This makes the hide even more confusing because hide is racey with
a RCU lookup.

Looks OK otherwise though

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next v11 00/13] Fix race conditions in rxe_pool
  2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
                   ` (12 preceding siblings ...)
  2022-03-04  0:08 ` [PATCH for-next v11 13/13] RDMA/rxe: Cleanup rxe_pool.c Bob Pearson
@ 2022-03-16  0:25 ` Jason Gunthorpe
  2022-03-16  4:05   ` Bob Pearson
  13 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2022-03-16  0:25 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Thu, Mar 03, 2022 at 06:07:56PM -0600, Bob Pearson wrote:
> There are several race conditions discovered in the current rdma_rxe
> driver.  They mostly relate to races between normal operations and
> destroying objects.  This patch series
>  - Makes several minor cleanups in rxe_pool.[ch]
>  - Replaces the red-black trees currently used by xarrays for indices
>  - Corrects several reference counting errors
>  - Adds wait for completions to the paths in verbs APIs which destroy
>    objects.
>  - Changes read side locking to rcu.
> 
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> v11
>   Rebased to current for-next.
>   Reordered patches and made other changes to respond to issues
>   reported by Jason Gunthorpe.
> v10
>   Rebased to current wip/jgg-for-next.
>   Split some patches into smaller ones.
> v9
>   Corrected issues reported by Jason Gunthorpe,
>   Converted locking in rxe_mcast.c and rxe_pool.c to use RCU
>   Split up the patches into smaller changes
> v8
>   Fixed an additional race in 3/8 which was not handled correctly.
> v7
>   Corrected issues reported by Jason Gunthorpe
> Link: https://lore.kernel.org/linux-rdma/20211207190947.GH6385@nvidia.com/
> Link: https://lore.kernel.org/linux-rdma/20211207191857.GI6385@nvidia.com/
> Link: https://lore.kernel.org/linux-rdma/20211207192824.GJ6385@nvidia.com/
> v6
>   Fixed a kzalloc flags bug.
>   Fixed comment bug reported by 'Kernel Test Robot'.
>   Changed type of rxe_pool.c in __rxe_fini().
> v5
>   Removed patches already accepted into for-next and addressed comments
>   from Jason Gunthorpe.
> v4
>   Restructured patch series to change to xarray earlier which
>   greatly simplified the changes.
>   Rebased to current for-next
> v3
>   Changed rxe_alloc to use GFP_KERNEL
>   Addressed other comments by Jason Gunthorp
>   Merged the previous 06/10 and 07/10 patches into one since they overlapped
>   Added some minor cleanups as 10/10
> v2
>   Rebased to current for-next.
>   Added 4 additional patches
> 
> Bob Pearson (13):
>   RDMA/rxe: Fix ref error in rxe_av.c
>   RDMA/rxe: Replace mr by rkey in responder resources
>   RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
>   RDMA/rxe: Delete _locked() APIs for pool objects
>   RDMA/rxe: Replace obj by elem in declaration
>   RDMA/rxe: Move max_elem into rxe_type_info
>   RDMA/rxe: Shorten pool names in rxe_pool.c
>   RDMA/rxe: Replace red-black trees by xarrays
>   RDMA/rxe: Use standard names for ref counting

If you let me know about the WARN_ON I think up to here is good

Thanks,
Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next v11 08/13] RDMA/rxe: Replace red-black trees by xarrays
  2022-03-15 23:45   ` Jason Gunthorpe
@ 2022-03-16  3:05     ` Bob Pearson
  0 siblings, 0 replies; 28+ messages in thread
From: Bob Pearson @ 2022-03-16  3:05 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 3/15/22 18:45, Jason Gunthorpe wrote:
> On Thu, Mar 03, 2022 at 06:08:04PM -0600, Bob Pearson wrote:
>>  void rxe_pool_cleanup(struct rxe_pool *pool)
>>  {
>>  	struct rxe_pool_elem *elem;
>> +	struct xarray *xa = &pool->xa;
>> +	unsigned long index = 0;
>> +	unsigned long max = ULONG_MAX;
>> +	unsigned int elem_count = 0;
>> +	unsigned int obj_count = 0;
>> +
>> +	do {
>> +		elem = xa_find(xa, &index, max, XA_PRESENT);
>> +		if (elem) {
>> +			elem_count++;
>> +			xa_erase(xa, index);
>> +			if (pool->flags & RXE_POOL_ALLOC) {
>> +				kfree(elem->obj);
>> +				obj_count++;
>> +			}
>>  		}
>> +	} while (elem);
>>  
>> +	if (WARN_ON(elem_count || obj_count))
>> +		pr_debug("Freed %d indices and %d objects from pool %s\n",
>> +			elem_count, obj_count, pool->name);
> 
> Can this just be 
> 
> WARN_ON(!xa_empty(xa));
> 
> ?
> 
> Freeing memory that is still in use upgrades a resource leak to a UAF
> security bug, so that is usually not good.
> 
> Jason

FWIW rxe_pool_cleanup() gets called in rxe_dealloc() which is the .dealloc_driver member in ib_device_ops.
In other words not until you are unloading the driver and only MRs are freed and the memory will never get used again. I have found it useful when debugging ref count mistakes.

OTOH, I can save the hunk and apply it when I need it so I don't really care if it goes upstream. So feel free
to go ahead and change it. The one line should go into rxe_dealloc() in rxe.c.

Bob


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next v11 10/13] RDMA/rxe: Stop lookup of partially built objects
  2022-03-16  0:16   ` Jason Gunthorpe
@ 2022-03-16  3:55     ` Bob Pearson
  2022-03-16 13:42       ` Jason Gunthorpe
  0 siblings, 1 reply; 28+ messages in thread
From: Bob Pearson @ 2022-03-16  3:55 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 3/15/22 19:16, Jason Gunthorpe wrote:
> On Thu, Mar 03, 2022 at 06:08:06PM -0600, Bob Pearson wrote:
>> Currently the rdma_rxe driver has a security weakness due to giving
>> objects which are partially initialized indices allowing external
>> actors to gain access to them by sending packets which refer to
>> their index (e.g. qpn, rkey, etc) causing unpredictable results.
>>
>> This patch adds two new APIs rxe_show(obj) and rxe_hide(obj) which
>> enable or disable looking up pool objects from indices using
>> rxe_pool_get_index(). By default objects are disabled. These APIs
>> are used to enable looking up objects which have indices:
>> AH, SRQ, QP, MR, and MW. They are added in create verbs after the
>> objects are fully initialized and as soon as possible in destroy
>> verbs.
> 
> In other parts of rdma we use the word 'finalize' where you used show
> 
> So rxe_pool_finalize() or something
> 
> I'm not clear on what hide is supposed to be for, if the object is
> being destroyed why do we need a period when it is NULL in the xarray
> before just erasing it?
The problem I am trying to solve is that when a destroy verb is called I want
to stop looking up rdma objects from their numbers (rkey, qpn, etc.) which
happens when a new packet arrives that refers to the object. So we start the
object destroy flow but we may hit a long delay if there are already
references taken on the object. In the next patch we are going to add a complete
wait_for_completion so that we won't return to rdma_core until all the refs
are dropped. While we are waiting for some past event to complete and drop its
reference new packets may arrive and take a reference on the object while looking it
up. Potentially this could happen many times. I just want to stop accepting any
new packets as soon as the destroy verb gets called. Meanwhile we will allow
past packets to complete. That is what hide() does.

Show() deals with the opposite problem. The way the object library worked as soon as
the object was created or 'added to the pool' it becomes searchable. But some of the
verbs have a lot of code to execute after the object gets its number assigned so by
setting the link to NULL in the xa_alloc call we get the index assigned but the
object is not searchable. show turns it on at the end for the create verb call after
all the init code is done.

There are likely better names but these were the ones that came to mind.

> 
>> @@ -221,8 +221,12 @@ static void rxe_elem_release(struct kref *kref)
>>  {
>>  	struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt);
>>  	struct rxe_pool *pool = elem->pool;
>> +	struct xarray *xa = &pool->xa;
>> +	unsigned long flags;
>>  
>> -	xa_erase(&pool->xa, elem->index);
>> +	xa_lock_irqsave(xa, flags);
>> +	__xa_erase(&pool->xa, elem->index);
>> +	xa_unlock_irqrestore(xa, flags);
> 
> I guess it has to do with this, but why have the xa_erase in the kref
> release at all?
> 
>>  	if (pool->cleanup)
>>  		pool->cleanup(elem);
>> @@ -242,3 +246,33 @@ int __rxe_put(struct rxe_pool_elem *elem)
>>  {
>>  	return kref_put(&elem->ref_cnt, rxe_elem_release);
>>  }
>> +
>> +int __rxe_show(struct rxe_pool_elem *elem)
>> +{
>> +	struct xarray *xa = &elem->pool->xa;
>> +	unsigned long flags;
>> +	void *ret;
>> +
>> +	xa_lock_irqsave(xa, flags);
>> +	ret = __xa_store(&elem->pool->xa, elem->index, elem, GFP_KERNEL);
>> +	xa_unlock_irqrestore(xa, flags);
>> +	if (IS_ERR(ret))
>> +		return PTR_ERR(ret);
> 
> This can't fail due to the xa memory already being allocated. You can
> just WARN_ON here and 'finalize' should not return an error code.
> 
> If you want to be fancy this checks for other mistakes too:
> 
>    tmp = xa_cmpxchg((&elem->pool->xa, elem->index, XA_ZERO_ENTRY,  elem, 0)
>    WARN_ON(tmp != NULL);
> 
>> +int __rxe_hide(struct rxe_pool_elem *elem)
>> +{
>> +	struct xarray *xa = &elem->pool->xa;
>> +	unsigned long flags;
>> +	void *ret;
>> +
>> +	xa_lock_irqsave(xa, flags);
>> +	ret = __xa_store(&elem->pool->xa, elem->index, NULL, GFP_KERNEL);
>> +	xa_unlock_irqrestore(xa, flags);
> 
> IIRC storing NULL is the same as erase, isn't it?  You have to store
> XA_ZERO_ENTRY if you want to keep an allocated NULL
This works just fine. You are correct for normal xarrays but for allocating xarrays
setting the link to NULL is treated differently and it remembers. Here is the comment
in <kernel>/Documentation

Allocating XArrays

If you use DEFINE_XARRAY_ALLOC() to define the XArray, or initialise it by passing XA_FLAGS_ALLOC to xa_init_flags(), the XArray changes to track whether entries are in use or not.

You can call xa_alloc() to store the entry at an unused index in the XArray. If you need to modify the array from interrupt context, you can use xa_alloc_bh() or xa_alloc_irq() to disable interrupts while allocating the ID.

Using xa_store(), xa_cmpxchg() or xa_insert() will also mark the entry as being allocated. Unlike a normal XArray, storing NULL will mark the entry as being in use, like xa_reserve(). To free an entry, use xa_erase() (or xa_release() if you only want to free the entry if it’s NULL).

> 
>> +	if (IS_ERR(ret))
>> +		return PTR_ERR(ret);
>> +	else
>> +		return 0;
>> +}
> 
> Same remark about the error handling
> 
>> @@ -491,6 +497,7 @@ static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
>>  	struct rxe_qp *qp = to_rqp(ibqp);
>>  	int ret;
>>  
>> +	rxe_hide(qp);
>>  	ret = rxe_qp_chk_destroy(qp);
>>  	if (ret)
>>  		return ret;
> 
> So we decided not to destroy the QP but wrecked it in the xarray?
> 
> Not convinced about the hide at all..
This particular example is pretty light weight since rxe_qp_chk_destroy only makes one
memory reference. But dereg_mr actually can do a lot of work and in either case
the idea as explained above is not to wreck the object but prevent rxe_pool_get_index(pool, index)
from succeeding and taking a new ref on the object. Once the verbs client has called a destroy
verb I don't see any reason why we should continue to process newly arriving packets forever which
is the only place in the driver where we convert object numbers to objects.

There is another issue which is not being dealt with here and that is partially torn down
objects. After this patch the flow for a destroy verb for an indexed object is

hide() =	"disable new lookups, e.g. xa_store(NULL)"
		"misc tear down code"
rxe_put() = 	"drop a reference, will be last one if the object is quiescent"
		"if necessary wait until last ref dropped"
		"object cleanup code, includes xa_erase()"

If objects are still holding references you have to be careful to make sure that
nothing in misc tear down code matters for an outstanding reference. Currently this all works
but if any change breaks things we have had to defer the change to the cleanup phase.
It doesn't work by design but just debugging in the past.

Bob


> 
> Jason


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next v11 11/13] RDMA/rxe: Add wait_for_completion to pool objects
  2022-03-16  0:17   ` Jason Gunthorpe
@ 2022-03-16  3:57     ` Bob Pearson
  2022-03-16 13:43       ` Jason Gunthorpe
  0 siblings, 1 reply; 28+ messages in thread
From: Bob Pearson @ 2022-03-16  3:57 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 3/15/22 19:17, Jason Gunthorpe wrote:
> On Thu, Mar 03, 2022 at 06:08:07PM -0600, Bob Pearson wrote:
>> Reference counting for object deletion can cause an object to
>> wait for something else to happen before an object gets deleted.
>> The destroy verbs can then return to rdma-core with the object still
>> holding references. Adding wait_for_completion in this path
>> prevents this.
> 
> Maybe call this rxe_pool_destroy() or something instead of wait
> 
> wait doesn't really convay to the reader tha the memory is free after
> it returns
> 
> Jason

which is correct because except for MRs which are freed in completion code
all the other objects are just passed back to rdma-core which will free them
in its own good time.

Bob

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next v11 12/13] RDMA/rxe: Convert read side locking to rcu
  2022-03-16  0:18   ` Jason Gunthorpe
@ 2022-03-16  4:05     ` Bob Pearson
  0 siblings, 0 replies; 28+ messages in thread
From: Bob Pearson @ 2022-03-16  4:05 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 3/15/22 19:18, Jason Gunthorpe wrote:
> On Thu, Mar 03, 2022 at 06:08:08PM -0600, Bob Pearson wrote:
>> Use rcu_read_lock() for protecting read side operations in rxe_pool.c.
>>
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>  drivers/infiniband/sw/rxe/rxe_pool.c | 7 +++----
>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
>> index 4fb6c7dd32ad..ec464b03d120 100644
>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
>> @@ -207,16 +207,15 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
>>  {
>>  	struct rxe_pool_elem *elem;
>>  	struct xarray *xa = &pool->xa;
>> -	unsigned long flags;
>>  	void *obj;
>>  
>> -	xa_lock_irqsave(xa, flags);
>> +	rcu_read_lock();
>>  	elem = xa_load(xa, index);
>>  	if (elem && kref_get_unless_zero(&elem->ref_cnt))
>>  		obj = elem->obj;
>>  	else
>>  		obj = NULL;
>> -	xa_unlock_irqrestore(xa, flags);
>> +	rcu_read_unlock();
> 
> This makes the hide even more confusing because hide is racey with
> a RCU lookup.
> 
> Looks OK otherwise though
> 
> Jason

This conforms with my understanding of RCU. hide() is a write side operation
since it changes the xarray contents while pool_get_index() is a read side
operation. The xa_load here just has to pick a side of the fence for what
to return as the link. It should either return elem or NULL just not something
in the middle. If we have to put the spinlock around the lookup there is no
point in using RCU. In theory there are typically many packets received for
each object create/destroy operation so we should benefit from RCU.

Bob


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next v11 00/13] Fix race conditions in rxe_pool
  2022-03-16  0:25 ` [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Jason Gunthorpe
@ 2022-03-16  4:05   ` Bob Pearson
  2022-03-16 16:08     ` Jason Gunthorpe
  0 siblings, 1 reply; 28+ messages in thread
From: Bob Pearson @ 2022-03-16  4:05 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 3/15/22 19:25, Jason Gunthorpe wrote:
> On Thu, Mar 03, 2022 at 06:07:56PM -0600, Bob Pearson wrote:
>> There are several race conditions discovered in the current rdma_rxe
>> driver.  They mostly relate to races between normal operations and
>> destroying objects.  This patch series
>>  - Makes several minor cleanups in rxe_pool.[ch]
>>  - Replaces the red-black trees currently used by xarrays for indices
>>  - Corrects several reference counting errors
>>  - Adds wait for completions to the paths in verbs APIs which destroy
>>    objects.
>>  - Changes read side locking to rcu.
>>
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> v11
>>   Rebased to current for-next.
>>   Reordered patches and made other changes to respond to issues
>>   reported by Jason Gunthorpe.
>> v10
>>   Rebased to current wip/jgg-for-next.
>>   Split some patches into smaller ones.
>> v9
>>   Corrected issues reported by Jason Gunthorpe,
>>   Converted locking in rxe_mcast.c and rxe_pool.c to use RCU
>>   Split up the patches into smaller changes
>> v8
>>   Fixed an additional race in 3/8 which was not handled correctly.
>> v7
>>   Corrected issues reported by Jason Gunthorpe
>> Link: https://lore.kernel.org/linux-rdma/20211207190947.GH6385@nvidia.com/
>> Link: https://lore.kernel.org/linux-rdma/20211207191857.GI6385@nvidia.com/
>> Link: https://lore.kernel.org/linux-rdma/20211207192824.GJ6385@nvidia.com/
>> v6
>>   Fixed a kzalloc flags bug.
>>   Fixed comment bug reported by 'Kernel Test Robot'.
>>   Changed type of rxe_pool.c in __rxe_fini().
>> v5
>>   Removed patches already accepted into for-next and addressed comments
>>   from Jason Gunthorpe.
>> v4
>>   Restructured patch series to change to xarray earlier which
>>   greatly simplified the changes.
>>   Rebased to current for-next
>> v3
>>   Changed rxe_alloc to use GFP_KERNEL
>>   Addressed other comments by Jason Gunthorp
>>   Merged the previous 06/10 and 07/10 patches into one since they overlapped
>>   Added some minor cleanups as 10/10
>> v2
>>   Rebased to current for-next.
>>   Added 4 additional patches
>>
>> Bob Pearson (13):
>>   RDMA/rxe: Fix ref error in rxe_av.c
>>   RDMA/rxe: Replace mr by rkey in responder resources
>>   RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
>>   RDMA/rxe: Delete _locked() APIs for pool objects
>>   RDMA/rxe: Replace obj by elem in declaration
>>   RDMA/rxe: Move max_elem into rxe_type_info
>>   RDMA/rxe: Shorten pool names in rxe_pool.c
>>   RDMA/rxe: Replace red-black trees by xarrays
>>   RDMA/rxe: Use standard names for ref counting
> 
> If you let me know about the WARN_ON I think up to here is good
> 
> Thanks,
> Jason

I agreed to the change.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next v11 10/13] RDMA/rxe: Stop lookup of partially built objects
  2022-03-16  3:55     ` Bob Pearson
@ 2022-03-16 13:42       ` Jason Gunthorpe
  0 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2022-03-16 13:42 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Tue, Mar 15, 2022 at 10:55:01PM -0500, Bob Pearson wrote:
> On 3/15/22 19:16, Jason Gunthorpe wrote:
> > On Thu, Mar 03, 2022 at 06:08:06PM -0600, Bob Pearson wrote:
> >> Currently the rdma_rxe driver has a security weakness due to giving
> >> objects which are partially initialized indices allowing external
> >> actors to gain access to them by sending packets which refer to
> >> their index (e.g. qpn, rkey, etc) causing unpredictable results.
> >>
> >> This patch adds two new APIs rxe_show(obj) and rxe_hide(obj) which
> >> enable or disable looking up pool objects from indices using
> >> rxe_pool_get_index(). By default objects are disabled. These APIs
> >> are used to enable looking up objects which have indices:
> >> AH, SRQ, QP, MR, and MW. They are added in create verbs after the
> >> objects are fully initialized and as soon as possible in destroy
> >> verbs.
> > 
> > In other parts of rdma we use the word 'finalize' where you used show
> > 
> > So rxe_pool_finalize() or something
> > 
> > I'm not clear on what hide is supposed to be for, if the object is
> > being destroyed why do we need a period when it is NULL in the xarray
> > before just erasing it?
> The problem I am trying to solve is that when a destroy verb is called I want
> to stop looking up rdma objects from their numbers (rkey, qpn, etc.) which
> happens when a new packet arrives that refers to the object. So we start the
> object destroy flow but we may hit a long delay if there are already
> references taken on the object. In the next patch we are going to add a complete
> wait_for_completion so that we won't return to rdma_core until all the refs
> are dropped. While we are waiting for some past event to complete and drop its
> reference new packets may arrive and take a reference on the object while looking it
> up. Potentially this could happen many times. I just want to stop accepting any
> new packets as soon as the destroy verb gets called. Meanwhile we will allow
> past packets to complete. That is what hide() does.

But why not just do xa_erase? Why try to preserve the index during
this time?

> Show() deals with the opposite problem. The way the object library worked as soon as
> the object was created or 'added to the pool' it becomes searchable. But some of the
> verbs have a lot of code to execute after the object gets its number assigned so by
> setting the link to NULL in the xa_alloc call we get the index assigned but the
> object is not searchable. show turns it on at the end for the create verb call after
> all the init code is done.

Show/finalize is a pretty common pattern when working with xarrays, it
is fine

> >> @@ -491,6 +497,7 @@ static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
> >>  	struct rxe_qp *qp = to_rqp(ibqp);
> >>  	int ret;
> >>  
> >> +	rxe_hide(qp);
> >>  	ret = rxe_qp_chk_destroy(qp);
> >>  	if (ret)
> >>  		return ret;
> > 
> > So we decided not to destroy the QP but wrecked it in the xarray?
> > 
> > Not convinced about the hide at all..

> This particular example is pretty light weight since rxe_qp_chk_destroy only makes one
> memory reference. But dereg_mr actually can do a lot of work and in either case
> the idea as explained above is not to wreck the object but prevent rxe_pool_get_index(pool, index)
> from succeeding and taking a new ref on the object. Once the verbs client has called a destroy
> verb I don't see any reason why we should continue to process newly arriving packets forever which
> is the only place in the driver where we convert object numbers to objects.

I think once you commit to destroying the object then just take it out
of the xarray immediately and go on and do the other stuff.
 
> There is another issue which is not being dealt with here and that
> is partially torn down objects. After this patch the flow for a
> destroy verb for an indexed object is
> 
> hide() =	"disable new lookups, e.g. xa_store(NULL)"
> 		"misc tear down code"
> rxe_put() = 	"drop a reference, will be last one if the object is quiescent"
> 		"if necessary wait until last ref dropped"
> 		"object cleanup code, includes xa_erase()"
> 
> If objects are still holding references you have to be careful to
> make sure that nothing in misc tear down code matters for an
> outstanding reference. Currently this all works but if any change
> breaks things we have had to defer the change to the cleanup phase.
> It doesn't work by design but just debugging in the past.

I'd say this is not good, the normal pattern I would expect is to
remove it from the xarray and then drive the references to zero before
starting any destruction. Does the wait patch deal with it?

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next v11 11/13] RDMA/rxe: Add wait_for_completion to pool objects
  2022-03-16  3:57     ` Bob Pearson
@ 2022-03-16 13:43       ` Jason Gunthorpe
  0 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2022-03-16 13:43 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Tue, Mar 15, 2022 at 10:57:20PM -0500, Bob Pearson wrote:
> On 3/15/22 19:17, Jason Gunthorpe wrote:
> > On Thu, Mar 03, 2022 at 06:08:07PM -0600, Bob Pearson wrote:
> >> Reference counting for object deletion can cause an object to
> >> wait for something else to happen before an object gets deleted.
> >> The destroy verbs can then return to rdma-core with the object still
> >> holding references. Adding wait_for_completion in this path
> >> prevents this.
> > 
> > Maybe call this rxe_pool_destroy() or something instead of wait
> > 
> > wait doesn't really convay to the reader tha the memory is free after
> > it returns
> > 
> > Jason
> 
> which is correct because except for MRs which are freed in completion code
> all the other objects are just passed back to rdma-core which will free them
> in its own good time.

If you do as I suggested to put the xa_erase in the rxe_wait then it
is much closer to destroy.

Perhaps 'unadd' or 'isolate' ?

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH for-next v11 00/13] Fix race conditions in rxe_pool
  2022-03-16  4:05   ` Bob Pearson
@ 2022-03-16 16:08     ` Jason Gunthorpe
  2022-03-16 16:09       ` Pearson, Robert B
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2022-03-16 16:08 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Tue, Mar 15, 2022 at 11:05:48PM -0500, Bob Pearson wrote:
> >> Bob Pearson (13):
> >>   RDMA/rxe: Fix ref error in rxe_av.c
> >>   RDMA/rxe: Replace mr by rkey in responder resources
> >>   RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
> >>   RDMA/rxe: Delete _locked() APIs for pool objects
> >>   RDMA/rxe: Replace obj by elem in declaration
> >>   RDMA/rxe: Move max_elem into rxe_type_info
> >>   RDMA/rxe: Shorten pool names in rxe_pool.c
> >>   RDMA/rxe: Replace red-black trees by xarrays
> >>   RDMA/rxe: Use standard names for ref counting
> > 
> > If you let me know about the WARN_ON I think up to here is good
> 
> I agreed to the change.

Ok applied to for-next

Thanks,
Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH for-next v11 00/13] Fix race conditions in rxe_pool
  2022-03-16 16:08     ` Jason Gunthorpe
@ 2022-03-16 16:09       ` Pearson, Robert B
  0 siblings, 0 replies; 28+ messages in thread
From: Pearson, Robert B @ 2022-03-16 16:09 UTC (permalink / raw)
  To: Jason Gunthorpe, Bob Pearson; +Cc: zyjzyj2000, linux-rdma



-----Original Message-----
From: Jason Gunthorpe <jgg@nvidia.com> 
Sent: Wednesday, March 16, 2022 11:08 AM
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: zyjzyj2000@gmail.com; linux-rdma@vger.kernel.org
Subject: Re: [PATCH for-next v11 00/13] Fix race conditions in rxe_pool

On Tue, Mar 15, 2022 at 11:05:48PM -0500, Bob Pearson wrote:
> >> Bob Pearson (13):
> >>   RDMA/rxe: Fix ref error in rxe_av.c
> >>   RDMA/rxe: Replace mr by rkey in responder resources
> >>   RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
> >>   RDMA/rxe: Delete _locked() APIs for pool objects
> >>   RDMA/rxe: Replace obj by elem in declaration
> >>   RDMA/rxe: Move max_elem into rxe_type_info
> >>   RDMA/rxe: Shorten pool names in rxe_pool.c
> >>   RDMA/rxe: Replace red-black trees by xarrays
> >>   RDMA/rxe: Use standard names for ref counting
> > 
> > If you let me know about the WARN_ON I think up to here is good
> 
> I agreed to the change.

Ok applied to for-next

Thanks,
Jason


Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-03-16 16:09 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-04  0:07 [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Bob Pearson
2022-03-04  0:07 ` [PATCH for-next v11 01/13] RDMA/rxe: Fix ref error in rxe_av.c Bob Pearson
2022-03-04  0:07 ` [PATCH for-next v11 02/13] RDMA/rxe: Replace mr by rkey in responder resources Bob Pearson
2022-03-04  0:07 ` [PATCH for-next v11 03/13] RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 04/13] RDMA/rxe: Delete _locked() APIs for pool objects Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 05/13] RDMA/rxe: Replace obj by elem in declaration Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 06/13] RDMA/rxe: Move max_elem into rxe_type_info Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 07/13] RDMA/rxe: Shorten pool names in rxe_pool.c Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 08/13] RDMA/rxe: Replace red-black trees by xarrays Bob Pearson
2022-03-15 23:45   ` Jason Gunthorpe
2022-03-16  3:05     ` Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 09/13] RDMA/rxe: Use standard names for ref counting Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 10/13] RDMA/rxe: Stop lookup of partially built objects Bob Pearson
2022-03-16  0:16   ` Jason Gunthorpe
2022-03-16  3:55     ` Bob Pearson
2022-03-16 13:42       ` Jason Gunthorpe
2022-03-04  0:08 ` [PATCH for-next v11 11/13] RDMA/rxe: Add wait_for_completion to pool objects Bob Pearson
2022-03-16  0:17   ` Jason Gunthorpe
2022-03-16  3:57     ` Bob Pearson
2022-03-16 13:43       ` Jason Gunthorpe
2022-03-04  0:08 ` [PATCH for-next v11 12/13] RDMA/rxe: Convert read side locking to rcu Bob Pearson
2022-03-16  0:18   ` Jason Gunthorpe
2022-03-16  4:05     ` Bob Pearson
2022-03-04  0:08 ` [PATCH for-next v11 13/13] RDMA/rxe: Cleanup rxe_pool.c Bob Pearson
2022-03-16  0:25 ` [PATCH for-next v11 00/13] Fix race conditions in rxe_pool Jason Gunthorpe
2022-03-16  4:05   ` Bob Pearson
2022-03-16 16:08     ` Jason Gunthorpe
2022-03-16 16:09       ` Pearson, Robert B

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.