linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe
@ 2021-11-03  5:02 Bob Pearson
  2021-11-03  5:02 ` [PATCH for-next v4 01/13] RDMA/rxe: Replace irqsave locks with bh locks Bob Pearson
                   ` (13 more replies)
  0 siblings, 14 replies; 23+ messages in thread
From: Bob Pearson @ 2021-11-03  5:02 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

There are several race conditions discovered in the current rdma_rxe
driver.  They mostly relate to races between normal operations and
destroying objects.  This patch series makes several minor cleanups in
rxe_pool.[ch] and replaces the red-black trees currently used by xarrays
which have better atomic behavior.

This patch series applies cleanly to current for-next.
commit 6a463bc9d999 ("Merge branch 'for-rc' into rdma.git for-next")

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
v4
  Restructured patch series to change to xarray earlier which
  greatly simplified the changes.
  Rebased to current for-next
v3
  Changed rxe_alloc to use GFP_KERNEL
  Addressed other comments by Jason Gunthorp
  Merged the previous 06/10 and 07/10 patches into one since they overlapped
  Added some minor cleanups as 10/10
v2
  Rebased to current for-next.
  Added 4 additional patches

Bob Pearson (13):
  RDMA/rxe: Replace irqsave locks with bh locks
  RDMA/rxe: Cleanup rxe_pool_entry
  RDMA/rxe: Copy setup parameters into rxe_pool
  RDMA/rxe: Save object pointer in pool element
  RDMA/rxe: Replace RB tree by xarray for indexes
  RDMA/rxe: Remove #include "rxe_loc.h" from rxe_pool.c
  RDMA/rxe: Remove some #defines from rxe_pool.h
  RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
  RDMA/rxe: Replaced keyed rxe objects by indexed objects
  RDMA/rxe: Prevent taking references to dead objects
  RDMA/rxe: Fix ref error in rxe_av.c
  RDMA/rxe: Replace mr by rkey in responder resources
  RDMA/rxe: Protect against race between get_index and drop_ref

 drivers/infiniband/sw/rxe/rxe.c       | 100 +-----
 drivers/infiniband/sw/rxe/rxe_av.c    |  19 +-
 drivers/infiniband/sw/rxe/rxe_comp.c  |   8 +-
 drivers/infiniband/sw/rxe/rxe_cq.c    |  24 +-
 drivers/infiniband/sw/rxe/rxe_loc.h   |  25 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c | 167 ++++++---
 drivers/infiniband/sw/rxe/rxe_mr.c    |   7 +-
 drivers/infiniband/sw/rxe/rxe_mw.c    |  25 +-
 drivers/infiniband/sw/rxe/rxe_net.c   |  17 +-
 drivers/infiniband/sw/rxe/rxe_pool.c  | 479 ++++++++------------------
 drivers/infiniband/sw/rxe/rxe_pool.h  | 129 ++-----
 drivers/infiniband/sw/rxe/rxe_qp.c    |  16 +-
 drivers/infiniband/sw/rxe/rxe_queue.c |   9 +-
 drivers/infiniband/sw/rxe/rxe_recv.c  |   3 +-
 drivers/infiniband/sw/rxe/rxe_req.c   |  66 ++--
 drivers/infiniband/sw/rxe/rxe_resp.c  | 125 ++++---
 drivers/infiniband/sw/rxe/rxe_srq.c   |   2 +-
 drivers/infiniband/sw/rxe/rxe_task.c  |  18 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c |  73 ++--
 drivers/infiniband/sw/rxe/rxe_verbs.h |  23 +-
 20 files changed, 542 insertions(+), 793 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH for-next v4 01/13] RDMA/rxe: Replace irqsave locks with bh locks
  2021-11-03  5:02 [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Bob Pearson
@ 2021-11-03  5:02 ` Bob Pearson
  2021-11-03  5:02 ` [PATCH for-next v4 02/13] RDMA/rxe: Cleanup rxe_pool_entry Bob Pearson
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Bob Pearson @ 2021-11-03  5:02 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Most of the locks in the rxe driver are _irqsave/restore locks but
in fact there are no interrupt threads that run rxe code or share data
with rxe. There are softirq threads and data sharing so the appropriate
lock type is _bh. This patch replaces all irqsave type locks with bh
type locks.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c  |  8 +++----
 drivers/infiniband/sw/rxe/rxe_cq.c    | 20 +++++++-----------
 drivers/infiniband/sw/rxe/rxe_mcast.c |  7 +++----
 drivers/infiniband/sw/rxe/rxe_mw.c    | 15 ++++++--------
 drivers/infiniband/sw/rxe/rxe_pool.c  | 30 +++++++++++----------------
 drivers/infiniband/sw/rxe/rxe_queue.c |  9 ++++----
 drivers/infiniband/sw/rxe/rxe_req.c   | 11 ++++------
 drivers/infiniband/sw/rxe/rxe_task.c  | 18 +++++++---------
 drivers/infiniband/sw/rxe/rxe_verbs.c | 27 ++++++++++--------------
 9 files changed, 59 insertions(+), 86 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index d771ba8449a1..f363fe3fa414 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -458,8 +458,6 @@ static inline enum comp_state complete_ack(struct rxe_qp *qp,
 					   struct rxe_pkt_info *pkt,
 					   struct rxe_send_wqe *wqe)
 {
-	unsigned long flags;
-
 	if (wqe->has_rd_atomic) {
 		wqe->has_rd_atomic = 0;
 		atomic_inc(&qp->req.rd_atomic);
@@ -472,11 +470,11 @@ static inline enum comp_state complete_ack(struct rxe_qp *qp,
 
 	if (unlikely(qp->req.state == QP_STATE_DRAIN)) {
 		/* state_lock used by requester & completer */
-		spin_lock_irqsave(&qp->state_lock, flags);
+		spin_lock_bh(&qp->state_lock);
 		if ((qp->req.state == QP_STATE_DRAIN) &&
 		    (qp->comp.psn == qp->req.psn)) {
 			qp->req.state = QP_STATE_DRAINED;
-			spin_unlock_irqrestore(&qp->state_lock, flags);
+			spin_unlock_bh(&qp->state_lock);
 
 			if (qp->ibqp.event_handler) {
 				struct ib_event ev;
@@ -488,7 +486,7 @@ static inline enum comp_state complete_ack(struct rxe_qp *qp,
 					qp->ibqp.qp_context);
 			}
 		} else {
-			spin_unlock_irqrestore(&qp->state_lock, flags);
+			spin_unlock_bh(&qp->state_lock);
 		}
 	}
 
diff --git a/drivers/infiniband/sw/rxe/rxe_cq.c b/drivers/infiniband/sw/rxe/rxe_cq.c
index 6848426c074f..84bd8669a80f 100644
--- a/drivers/infiniband/sw/rxe/rxe_cq.c
+++ b/drivers/infiniband/sw/rxe/rxe_cq.c
@@ -42,14 +42,13 @@ int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq,
 static void rxe_send_complete(struct tasklet_struct *t)
 {
 	struct rxe_cq *cq = from_tasklet(cq, t, comp_task);
-	unsigned long flags;
 
-	spin_lock_irqsave(&cq->cq_lock, flags);
+	spin_lock_bh(&cq->cq_lock);
 	if (cq->is_dying) {
-		spin_unlock_irqrestore(&cq->cq_lock, flags);
+		spin_unlock_bh(&cq->cq_lock);
 		return;
 	}
-	spin_unlock_irqrestore(&cq->cq_lock, flags);
+	spin_unlock_bh(&cq->cq_lock);
 
 	cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context);
 }
@@ -106,15 +105,14 @@ int rxe_cq_resize_queue(struct rxe_cq *cq, int cqe,
 int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
 {
 	struct ib_event ev;
-	unsigned long flags;
 	int full;
 	void *addr;
 
-	spin_lock_irqsave(&cq->cq_lock, flags);
+	spin_lock_bh(&cq->cq_lock);
 
 	full = queue_full(cq->queue, QUEUE_TYPE_TO_CLIENT);
 	if (unlikely(full)) {
-		spin_unlock_irqrestore(&cq->cq_lock, flags);
+		spin_unlock_bh(&cq->cq_lock);
 		if (cq->ibcq.event_handler) {
 			ev.device = cq->ibcq.device;
 			ev.element.cq = &cq->ibcq;
@@ -130,7 +128,7 @@ int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
 
 	queue_advance_producer(cq->queue, QUEUE_TYPE_TO_CLIENT);
 
-	spin_unlock_irqrestore(&cq->cq_lock, flags);
+	spin_unlock_bh(&cq->cq_lock);
 
 	if ((cq->notify == IB_CQ_NEXT_COMP) ||
 	    (cq->notify == IB_CQ_SOLICITED && solicited)) {
@@ -143,11 +141,9 @@ int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
 
 void rxe_cq_disable(struct rxe_cq *cq)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(&cq->cq_lock, flags);
+	spin_lock_bh(&cq->cq_lock);
 	cq->is_dying = true;
-	spin_unlock_irqrestore(&cq->cq_lock, flags);
+	spin_unlock_bh(&cq->cq_lock);
 }
 
 void rxe_cq_cleanup(struct rxe_pool_entry *arg)
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 1c1d1b53312d..ba6275fd3edb 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -40,12 +40,11 @@ int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 	int err;
 	struct rxe_mc_grp *grp;
 	struct rxe_pool *pool = &rxe->mc_grp_pool;
-	unsigned long flags;
 
 	if (rxe->attr.max_mcast_qp_attach == 0)
 		return -EINVAL;
 
-	write_lock_irqsave(&pool->pool_lock, flags);
+	write_lock_bh(&pool->pool_lock);
 
 	grp = rxe_pool_get_key_locked(pool, mgid);
 	if (grp)
@@ -53,13 +52,13 @@ int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 
 	grp = create_grp(rxe, pool, mgid);
 	if (IS_ERR(grp)) {
-		write_unlock_irqrestore(&pool->pool_lock, flags);
+		write_unlock_bh(&pool->pool_lock);
 		err = PTR_ERR(grp);
 		return err;
 	}
 
 done:
-	write_unlock_irqrestore(&pool->pool_lock, flags);
+	write_unlock_bh(&pool->pool_lock);
 	*grp_p = grp;
 	return 0;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c
index 9534a7fe1a98..3cbd38578230 100644
--- a/drivers/infiniband/sw/rxe/rxe_mw.c
+++ b/drivers/infiniband/sw/rxe/rxe_mw.c
@@ -56,11 +56,10 @@ int rxe_dealloc_mw(struct ib_mw *ibmw)
 {
 	struct rxe_mw *mw = to_rmw(ibmw);
 	struct rxe_pd *pd = to_rpd(ibmw->pd);
-	unsigned long flags;
 
-	spin_lock_irqsave(&mw->lock, flags);
+	spin_lock_bh(&mw->lock);
 	rxe_do_dealloc_mw(mw);
-	spin_unlock_irqrestore(&mw->lock, flags);
+	spin_unlock_bh(&mw->lock);
 
 	rxe_drop_ref(mw);
 	rxe_drop_ref(pd);
@@ -197,7 +196,6 @@ int rxe_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
 	u32 mw_rkey = wqe->wr.wr.mw.mw_rkey;
 	u32 mr_lkey = wqe->wr.wr.mw.mr_lkey;
-	unsigned long flags;
 
 	mw = rxe_pool_get_index(&rxe->mw_pool, mw_rkey >> 8);
 	if (unlikely(!mw)) {
@@ -225,7 +223,7 @@ int rxe_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 		mr = NULL;
 	}
 
-	spin_lock_irqsave(&mw->lock, flags);
+	spin_lock_bh(&mw->lock);
 
 	ret = rxe_check_bind_mw(qp, wqe, mw, mr);
 	if (ret)
@@ -233,7 +231,7 @@ int rxe_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 
 	rxe_do_bind_mw(qp, wqe, mw, mr);
 err_unlock:
-	spin_unlock_irqrestore(&mw->lock, flags);
+	spin_unlock_bh(&mw->lock);
 err_drop_mr:
 	if (mr)
 		rxe_drop_ref(mr);
@@ -280,7 +278,6 @@ static void rxe_do_invalidate_mw(struct rxe_mw *mw)
 int rxe_invalidate_mw(struct rxe_qp *qp, u32 rkey)
 {
 	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
-	unsigned long flags;
 	struct rxe_mw *mw;
 	int ret;
 
@@ -295,7 +292,7 @@ int rxe_invalidate_mw(struct rxe_qp *qp, u32 rkey)
 		goto err_drop_ref;
 	}
 
-	spin_lock_irqsave(&mw->lock, flags);
+	spin_lock_bh(&mw->lock);
 
 	ret = rxe_check_invalidate_mw(qp, mw);
 	if (ret)
@@ -303,7 +300,7 @@ int rxe_invalidate_mw(struct rxe_qp *qp, u32 rkey)
 
 	rxe_do_invalidate_mw(mw);
 err_unlock:
-	spin_unlock_irqrestore(&mw->lock, flags);
+	spin_unlock_bh(&mw->lock);
 err_drop_ref:
 	rxe_drop_ref(mw);
 err:
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 2e80bb6aa957..30178501bb2c 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -261,12 +261,11 @@ int __rxe_add_key_locked(struct rxe_pool_entry *elem, void *key)
 int __rxe_add_key(struct rxe_pool_entry *elem, void *key)
 {
 	struct rxe_pool *pool = elem->pool;
-	unsigned long flags;
 	int err;
 
-	write_lock_irqsave(&pool->pool_lock, flags);
+	write_lock_bh(&pool->pool_lock);
 	err = __rxe_add_key_locked(elem, key);
-	write_unlock_irqrestore(&pool->pool_lock, flags);
+	write_unlock_bh(&pool->pool_lock);
 
 	return err;
 }
@@ -281,11 +280,10 @@ void __rxe_drop_key_locked(struct rxe_pool_entry *elem)
 void __rxe_drop_key(struct rxe_pool_entry *elem)
 {
 	struct rxe_pool *pool = elem->pool;
-	unsigned long flags;
 
-	write_lock_irqsave(&pool->pool_lock, flags);
+	write_lock_bh(&pool->pool_lock);
 	__rxe_drop_key_locked(elem);
-	write_unlock_irqrestore(&pool->pool_lock, flags);
+	write_unlock_bh(&pool->pool_lock);
 }
 
 int __rxe_add_index_locked(struct rxe_pool_entry *elem)
@@ -302,12 +300,11 @@ int __rxe_add_index_locked(struct rxe_pool_entry *elem)
 int __rxe_add_index(struct rxe_pool_entry *elem)
 {
 	struct rxe_pool *pool = elem->pool;
-	unsigned long flags;
 	int err;
 
-	write_lock_irqsave(&pool->pool_lock, flags);
+	write_lock_bh(&pool->pool_lock);
 	err = __rxe_add_index_locked(elem);
-	write_unlock_irqrestore(&pool->pool_lock, flags);
+	write_unlock_bh(&pool->pool_lock);
 
 	return err;
 }
@@ -323,11 +320,10 @@ void __rxe_drop_index_locked(struct rxe_pool_entry *elem)
 void __rxe_drop_index(struct rxe_pool_entry *elem)
 {
 	struct rxe_pool *pool = elem->pool;
-	unsigned long flags;
 
-	write_lock_irqsave(&pool->pool_lock, flags);
+	write_lock_bh(&pool->pool_lock);
 	__rxe_drop_index_locked(elem);
-	write_unlock_irqrestore(&pool->pool_lock, flags);
+	write_unlock_bh(&pool->pool_lock);
 }
 
 void *rxe_alloc_locked(struct rxe_pool *pool)
@@ -447,11 +443,10 @@ void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index)
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 {
 	u8 *obj;
-	unsigned long flags;
 
-	read_lock_irqsave(&pool->pool_lock, flags);
+	read_lock_bh(&pool->pool_lock);
 	obj = rxe_pool_get_index_locked(pool, index);
-	read_unlock_irqrestore(&pool->pool_lock, flags);
+	read_unlock_bh(&pool->pool_lock);
 
 	return obj;
 }
@@ -493,11 +488,10 @@ void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key)
 void *rxe_pool_get_key(struct rxe_pool *pool, void *key)
 {
 	u8 *obj;
-	unsigned long flags;
 
-	read_lock_irqsave(&pool->pool_lock, flags);
+	read_lock_bh(&pool->pool_lock);
 	obj = rxe_pool_get_key_locked(pool, key);
-	read_unlock_irqrestore(&pool->pool_lock, flags);
+	read_unlock_bh(&pool->pool_lock);
 
 	return obj;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c
index 6e6e023c1b45..a1b283dd2d4c 100644
--- a/drivers/infiniband/sw/rxe/rxe_queue.c
+++ b/drivers/infiniband/sw/rxe/rxe_queue.c
@@ -151,7 +151,6 @@ int rxe_queue_resize(struct rxe_queue *q, unsigned int *num_elem_p,
 	struct rxe_queue *new_q;
 	unsigned int num_elem = *num_elem_p;
 	int err;
-	unsigned long flags = 0, flags1;
 
 	new_q = rxe_queue_init(q->rxe, &num_elem, elem_size, q->type);
 	if (!new_q)
@@ -165,17 +164,17 @@ int rxe_queue_resize(struct rxe_queue *q, unsigned int *num_elem_p,
 		goto err1;
 	}
 
-	spin_lock_irqsave(consumer_lock, flags1);
+	spin_lock_bh(consumer_lock);
 
 	if (producer_lock) {
-		spin_lock_irqsave(producer_lock, flags);
+		spin_lock_bh(producer_lock);
 		err = resize_finish(q, new_q, num_elem);
-		spin_unlock_irqrestore(producer_lock, flags);
+		spin_unlock_bh(producer_lock);
 	} else {
 		err = resize_finish(q, new_q, num_elem);
 	}
 
-	spin_unlock_irqrestore(consumer_lock, flags1);
+	spin_unlock_bh(consumer_lock);
 
 	rxe_queue_cleanup(new_q);	/* new/old dep on err */
 	if (err)
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 0c9d2af15f3d..c8d674da5cc2 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -110,7 +110,6 @@ void rnr_nak_timer(struct timer_list *t)
 static struct rxe_send_wqe *req_next_wqe(struct rxe_qp *qp)
 {
 	struct rxe_send_wqe *wqe;
-	unsigned long flags;
 	struct rxe_queue *q = qp->sq.queue;
 	unsigned int index = qp->req.wqe_index;
 	unsigned int cons;
@@ -124,25 +123,23 @@ static struct rxe_send_wqe *req_next_wqe(struct rxe_qp *qp)
 		/* check to see if we are drained;
 		 * state_lock used by requester and completer
 		 */
-		spin_lock_irqsave(&qp->state_lock, flags);
+		spin_lock_bh(&qp->state_lock);
 		do {
 			if (qp->req.state != QP_STATE_DRAIN) {
 				/* comp just finished */
-				spin_unlock_irqrestore(&qp->state_lock,
-						       flags);
+				spin_unlock_bh(&qp->state_lock);
 				break;
 			}
 
 			if (wqe && ((index != cons) ||
 				(wqe->state != wqe_state_posted))) {
 				/* comp not done yet */
-				spin_unlock_irqrestore(&qp->state_lock,
-						       flags);
+				spin_unlock_bh(&qp->state_lock);
 				break;
 			}
 
 			qp->req.state = QP_STATE_DRAINED;
-			spin_unlock_irqrestore(&qp->state_lock, flags);
+			spin_unlock_bh(&qp->state_lock);
 
 			if (qp->ibqp.event_handler) {
 				struct ib_event ev;
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 6951fdcb31bf..0c4db5bb17d7 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -32,25 +32,24 @@ void rxe_do_task(struct tasklet_struct *t)
 {
 	int cont;
 	int ret;
-	unsigned long flags;
 	struct rxe_task *task = from_tasklet(task, t, tasklet);
 
-	spin_lock_irqsave(&task->state_lock, flags);
+	spin_lock_bh(&task->state_lock);
 	switch (task->state) {
 	case TASK_STATE_START:
 		task->state = TASK_STATE_BUSY;
-		spin_unlock_irqrestore(&task->state_lock, flags);
+		spin_unlock_bh(&task->state_lock);
 		break;
 
 	case TASK_STATE_BUSY:
 		task->state = TASK_STATE_ARMED;
 		fallthrough;
 	case TASK_STATE_ARMED:
-		spin_unlock_irqrestore(&task->state_lock, flags);
+		spin_unlock_bh(&task->state_lock);
 		return;
 
 	default:
-		spin_unlock_irqrestore(&task->state_lock, flags);
+		spin_unlock_bh(&task->state_lock);
 		pr_warn("%s failed with bad state %d\n", __func__, task->state);
 		return;
 	}
@@ -59,7 +58,7 @@ void rxe_do_task(struct tasklet_struct *t)
 		cont = 0;
 		ret = task->func(task->arg);
 
-		spin_lock_irqsave(&task->state_lock, flags);
+		spin_lock_bh(&task->state_lock);
 		switch (task->state) {
 		case TASK_STATE_BUSY:
 			if (ret)
@@ -81,7 +80,7 @@ void rxe_do_task(struct tasklet_struct *t)
 			pr_warn("%s failed with bad state %d\n", __func__,
 				task->state);
 		}
-		spin_unlock_irqrestore(&task->state_lock, flags);
+		spin_unlock_bh(&task->state_lock);
 	} while (cont);
 
 	task->ret = ret;
@@ -106,7 +105,6 @@ int rxe_init_task(void *obj, struct rxe_task *task,
 
 void rxe_cleanup_task(struct rxe_task *task)
 {
-	unsigned long flags;
 	bool idle;
 
 	/*
@@ -116,9 +114,9 @@ void rxe_cleanup_task(struct rxe_task *task)
 	task->destroyed = true;
 
 	do {
-		spin_lock_irqsave(&task->state_lock, flags);
+		spin_lock_bh(&task->state_lock);
 		idle = (task->state == TASK_STATE_START);
-		spin_unlock_irqrestore(&task->state_lock, flags);
+		spin_unlock_bh(&task->state_lock);
 	} while (!idle);
 
 	tasklet_kill(&task->tasklet);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 0aa0d7e52773..dcb7436b9346 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -383,10 +383,9 @@ static int rxe_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr,
 			     const struct ib_recv_wr **bad_wr)
 {
 	int err = 0;
-	unsigned long flags;
 	struct rxe_srq *srq = to_rsrq(ibsrq);
 
-	spin_lock_irqsave(&srq->rq.producer_lock, flags);
+	spin_lock_bh(&srq->rq.producer_lock);
 
 	while (wr) {
 		err = post_one_recv(&srq->rq, wr);
@@ -395,7 +394,7 @@ static int rxe_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr,
 		wr = wr->next;
 	}
 
-	spin_unlock_irqrestore(&srq->rq.producer_lock, flags);
+	spin_unlock_bh(&srq->rq.producer_lock);
 
 	if (err)
 		*bad_wr = wr;
@@ -634,19 +633,18 @@ static int post_one_send(struct rxe_qp *qp, const struct ib_send_wr *ibwr,
 	int err;
 	struct rxe_sq *sq = &qp->sq;
 	struct rxe_send_wqe *send_wqe;
-	unsigned long flags;
 	int full;
 
 	err = validate_send_wr(qp, ibwr, mask, length);
 	if (err)
 		return err;
 
-	spin_lock_irqsave(&qp->sq.sq_lock, flags);
+	spin_lock_bh(&qp->sq.sq_lock);
 
 	full = queue_full(sq->queue, QUEUE_TYPE_TO_DRIVER);
 
 	if (unlikely(full)) {
-		spin_unlock_irqrestore(&qp->sq.sq_lock, flags);
+		spin_unlock_bh(&qp->sq.sq_lock);
 		return -ENOMEM;
 	}
 
@@ -655,7 +653,7 @@ static int post_one_send(struct rxe_qp *qp, const struct ib_send_wr *ibwr,
 
 	queue_advance_producer(sq->queue, QUEUE_TYPE_TO_DRIVER);
 
-	spin_unlock_irqrestore(&qp->sq.sq_lock, flags);
+	spin_unlock_bh(&qp->sq.sq_lock);
 
 	return 0;
 }
@@ -735,7 +733,6 @@ static int rxe_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
 	int err = 0;
 	struct rxe_qp *qp = to_rqp(ibqp);
 	struct rxe_rq *rq = &qp->rq;
-	unsigned long flags;
 
 	if (unlikely((qp_state(qp) < IB_QPS_INIT) || !qp->valid)) {
 		*bad_wr = wr;
@@ -749,7 +746,7 @@ static int rxe_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
 		goto err1;
 	}
 
-	spin_lock_irqsave(&rq->producer_lock, flags);
+	spin_lock_bh(&rq->producer_lock);
 
 	while (wr) {
 		err = post_one_recv(rq, wr);
@@ -760,7 +757,7 @@ static int rxe_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
 		wr = wr->next;
 	}
 
-	spin_unlock_irqrestore(&rq->producer_lock, flags);
+	spin_unlock_bh(&rq->producer_lock);
 
 	if (qp->resp.state == QP_STATE_ERROR)
 		rxe_run_task(&qp->resp.task, 1);
@@ -841,9 +838,8 @@ static int rxe_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc)
 	int i;
 	struct rxe_cq *cq = to_rcq(ibcq);
 	struct rxe_cqe *cqe;
-	unsigned long flags;
 
-	spin_lock_irqsave(&cq->cq_lock, flags);
+	spin_lock_bh(&cq->cq_lock);
 	for (i = 0; i < num_entries; i++) {
 		cqe = queue_head(cq->queue, QUEUE_TYPE_FROM_DRIVER);
 		if (!cqe)
@@ -852,7 +848,7 @@ static int rxe_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc)
 		memcpy(wc++, &cqe->ibwc, sizeof(*wc));
 		queue_advance_consumer(cq->queue, QUEUE_TYPE_FROM_DRIVER);
 	}
-	spin_unlock_irqrestore(&cq->cq_lock, flags);
+	spin_unlock_bh(&cq->cq_lock);
 
 	return i;
 }
@@ -870,11 +866,10 @@ static int rxe_peek_cq(struct ib_cq *ibcq, int wc_cnt)
 static int rxe_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags)
 {
 	struct rxe_cq *cq = to_rcq(ibcq);
-	unsigned long irq_flags;
 	int ret = 0;
 	int empty;
 
-	spin_lock_irqsave(&cq->cq_lock, irq_flags);
+	spin_lock_bh(&cq->cq_lock);
 	if (cq->notify != IB_CQ_NEXT_COMP)
 		cq->notify = flags & IB_CQ_SOLICITED_MASK;
 
@@ -883,7 +878,7 @@ static int rxe_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags)
 	if ((flags & IB_CQ_REPORT_MISSED_EVENTS) && !empty)
 		ret = 1;
 
-	spin_unlock_irqrestore(&cq->cq_lock, irq_flags);
+	spin_unlock_bh(&cq->cq_lock);
 
 	return ret;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v4 02/13] RDMA/rxe: Cleanup rxe_pool_entry
  2021-11-03  5:02 [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Bob Pearson
  2021-11-03  5:02 ` [PATCH for-next v4 01/13] RDMA/rxe: Replace irqsave locks with bh locks Bob Pearson
@ 2021-11-03  5:02 ` Bob Pearson
  2021-11-03  5:02 ` [PATCH for-next v4 03/13] RDMA/rxe: Copy setup parameters into rxe_pool Bob Pearson
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Bob Pearson @ 2021-11-03  5:02 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently three different names are used to describe rxe pool elements.
They are referred to as entries, elems or pelems. This patch chooses one
'elem' and changes the other ones.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_cq.c    |  4 +-
 drivers/infiniband/sw/rxe/rxe_loc.h   | 10 ++--
 drivers/infiniband/sw/rxe/rxe_mcast.c |  4 +-
 drivers/infiniband/sw/rxe/rxe_mr.c    |  6 +--
 drivers/infiniband/sw/rxe/rxe_mw.c    |  6 +--
 drivers/infiniband/sw/rxe/rxe_pool.c  | 72 +++++++++++++--------------
 drivers/infiniband/sw/rxe/rxe_pool.h  | 46 ++++++++---------
 drivers/infiniband/sw/rxe/rxe_qp.c    |  6 +--
 drivers/infiniband/sw/rxe/rxe_srq.c   |  2 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c |  2 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h | 22 ++++----
 11 files changed, 89 insertions(+), 91 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_cq.c b/drivers/infiniband/sw/rxe/rxe_cq.c
index 84bd8669a80f..6baaaa34458e 100644
--- a/drivers/infiniband/sw/rxe/rxe_cq.c
+++ b/drivers/infiniband/sw/rxe/rxe_cq.c
@@ -146,9 +146,9 @@ void rxe_cq_disable(struct rxe_cq *cq)
 	spin_unlock_bh(&cq->cq_lock);
 }
 
-void rxe_cq_cleanup(struct rxe_pool_entry *arg)
+void rxe_cq_cleanup(struct rxe_pool_elem *elem)
 {
-	struct rxe_cq *cq = container_of(arg, typeof(*cq), pelem);
+	struct rxe_cq *cq = container_of(elem, typeof(*cq), elem);
 
 	if (cq->queue)
 		rxe_queue_cleanup(cq->queue);
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 1ca43b859d80..b1e174afb1d4 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -37,7 +37,7 @@ int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited);
 
 void rxe_cq_disable(struct rxe_cq *cq);
 
-void rxe_cq_cleanup(struct rxe_pool_entry *arg);
+void rxe_cq_cleanup(struct rxe_pool_elem *arg);
 
 /* rxe_mcast.c */
 int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
@@ -51,7 +51,7 @@ int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 void rxe_drop_all_mcast_groups(struct rxe_qp *qp);
 
-void rxe_mc_cleanup(struct rxe_pool_entry *arg);
+void rxe_mc_cleanup(struct rxe_pool_elem *arg);
 
 /* rxe_mmap.c */
 struct rxe_mmap_info {
@@ -89,7 +89,7 @@ int rxe_invalidate_mr(struct rxe_qp *qp, u32 rkey);
 int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe);
 int rxe_mr_set_page(struct ib_mr *ibmr, u64 addr);
 int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata);
-void rxe_mr_cleanup(struct rxe_pool_entry *arg);
+void rxe_mr_cleanup(struct rxe_pool_elem *arg);
 
 /* rxe_mw.c */
 int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata);
@@ -97,7 +97,7 @@ int rxe_dealloc_mw(struct ib_mw *ibmw);
 int rxe_bind_mw(struct rxe_qp *qp, struct rxe_send_wqe *wqe);
 int rxe_invalidate_mw(struct rxe_qp *qp, u32 rkey);
 struct rxe_mw *rxe_lookup_mw(struct rxe_qp *qp, int access, u32 rkey);
-void rxe_mw_cleanup(struct rxe_pool_entry *arg);
+void rxe_mw_cleanup(struct rxe_pool_elem *arg);
 
 /* rxe_net.c */
 struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
@@ -131,7 +131,7 @@ void rxe_qp_error(struct rxe_qp *qp);
 
 void rxe_qp_destroy(struct rxe_qp *qp);
 
-void rxe_qp_cleanup(struct rxe_pool_entry *arg);
+void rxe_qp_cleanup(struct rxe_pool_elem *elem);
 
 static inline int qp_num(struct rxe_qp *qp)
 {
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index ba6275fd3edb..bd1ac88b8700 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -168,9 +168,9 @@ void rxe_drop_all_mcast_groups(struct rxe_qp *qp)
 	}
 }
 
-void rxe_mc_cleanup(struct rxe_pool_entry *arg)
+void rxe_mc_cleanup(struct rxe_pool_elem *elem)
 {
-	struct rxe_mc_grp *grp = container_of(arg, typeof(*grp), pelem);
+	struct rxe_mc_grp *grp = container_of(elem, typeof(*grp), elem);
 	struct rxe_dev *rxe = grp->rxe;
 
 	rxe_drop_key(grp);
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 53271df10e47..25c78aade822 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -50,7 +50,7 @@ int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length)
 
 static void rxe_mr_init(int access, struct rxe_mr *mr)
 {
-	u32 lkey = mr->pelem.index << 8 | rxe_get_next_key(-1);
+	u32 lkey = mr->elem.index << 8 | rxe_get_next_key(-1);
 	u32 rkey = (access & IB_ACCESS_REMOTE) ? lkey : 0;
 
 	/* set ibmr->l/rkey and also copy into private l/rkey
@@ -699,9 +699,9 @@ int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 	return 0;
 }
 
-void rxe_mr_cleanup(struct rxe_pool_entry *arg)
+void rxe_mr_cleanup(struct rxe_pool_elem *elem)
 {
-	struct rxe_mr *mr = container_of(arg, typeof(*mr), pelem);
+	struct rxe_mr *mr = container_of(elem, typeof(*mr), elem);
 
 	ib_umem_release(mr->umem);
 
diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c
index 3cbd38578230..32dd8c0b8b9e 100644
--- a/drivers/infiniband/sw/rxe/rxe_mw.c
+++ b/drivers/infiniband/sw/rxe/rxe_mw.c
@@ -21,7 +21,7 @@ int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata)
 	}
 
 	rxe_add_index(mw);
-	mw->rkey = ibmw->rkey = (mw->pelem.index << 8) | rxe_get_next_key(-1);
+	mw->rkey = ibmw->rkey = (mw->elem.index << 8) | rxe_get_next_key(-1);
 	mw->state = (mw->ibmw.type == IB_MW_TYPE_2) ?
 			RXE_MW_STATE_FREE : RXE_MW_STATE_VALID;
 	spin_lock_init(&mw->lock);
@@ -330,9 +330,9 @@ struct rxe_mw *rxe_lookup_mw(struct rxe_qp *qp, int access, u32 rkey)
 	return mw;
 }
 
-void rxe_mw_cleanup(struct rxe_pool_entry *elem)
+void rxe_mw_cleanup(struct rxe_pool_elem *elem)
 {
-	struct rxe_mw *mw = container_of(elem, typeof(*mw), pelem);
+	struct rxe_mw *mw = container_of(elem, typeof(*mw), elem);
 
 	rxe_drop_index(mw);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 30178501bb2c..4b4bf0e03ddd 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -11,7 +11,7 @@ static const struct rxe_type_info {
 	const char *name;
 	size_t size;
 	size_t elem_offset;
-	void (*cleanup)(struct rxe_pool_entry *obj);
+	void (*cleanup)(struct rxe_pool_elem *obj);
 	enum rxe_pool_flags flags;
 	u32 min_index;
 	u32 max_index;
@@ -21,19 +21,19 @@ static const struct rxe_type_info {
 	[RXE_TYPE_UC] = {
 		.name		= "rxe-uc",
 		.size		= sizeof(struct rxe_ucontext),
-		.elem_offset	= offsetof(struct rxe_ucontext, pelem),
+		.elem_offset	= offsetof(struct rxe_ucontext, elem),
 		.flags          = RXE_POOL_NO_ALLOC,
 	},
 	[RXE_TYPE_PD] = {
 		.name		= "rxe-pd",
 		.size		= sizeof(struct rxe_pd),
-		.elem_offset	= offsetof(struct rxe_pd, pelem),
+		.elem_offset	= offsetof(struct rxe_pd, elem),
 		.flags		= RXE_POOL_NO_ALLOC,
 	},
 	[RXE_TYPE_AH] = {
 		.name		= "rxe-ah",
 		.size		= sizeof(struct rxe_ah),
-		.elem_offset	= offsetof(struct rxe_ah, pelem),
+		.elem_offset	= offsetof(struct rxe_ah, elem),
 		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
 		.min_index	= RXE_MIN_AH_INDEX,
 		.max_index	= RXE_MAX_AH_INDEX,
@@ -41,7 +41,7 @@ static const struct rxe_type_info {
 	[RXE_TYPE_SRQ] = {
 		.name		= "rxe-srq",
 		.size		= sizeof(struct rxe_srq),
-		.elem_offset	= offsetof(struct rxe_srq, pelem),
+		.elem_offset	= offsetof(struct rxe_srq, elem),
 		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
 		.min_index	= RXE_MIN_SRQ_INDEX,
 		.max_index	= RXE_MAX_SRQ_INDEX,
@@ -49,7 +49,7 @@ static const struct rxe_type_info {
 	[RXE_TYPE_QP] = {
 		.name		= "rxe-qp",
 		.size		= sizeof(struct rxe_qp),
-		.elem_offset	= offsetof(struct rxe_qp, pelem),
+		.elem_offset	= offsetof(struct rxe_qp, elem),
 		.cleanup	= rxe_qp_cleanup,
 		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
 		.min_index	= RXE_MIN_QP_INDEX,
@@ -58,14 +58,14 @@ static const struct rxe_type_info {
 	[RXE_TYPE_CQ] = {
 		.name		= "rxe-cq",
 		.size		= sizeof(struct rxe_cq),
-		.elem_offset	= offsetof(struct rxe_cq, pelem),
+		.elem_offset	= offsetof(struct rxe_cq, elem),
 		.flags          = RXE_POOL_NO_ALLOC,
 		.cleanup	= rxe_cq_cleanup,
 	},
 	[RXE_TYPE_MR] = {
 		.name		= "rxe-mr",
 		.size		= sizeof(struct rxe_mr),
-		.elem_offset	= offsetof(struct rxe_mr, pelem),
+		.elem_offset	= offsetof(struct rxe_mr, elem),
 		.cleanup	= rxe_mr_cleanup,
 		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_MR_INDEX,
@@ -74,7 +74,7 @@ static const struct rxe_type_info {
 	[RXE_TYPE_MW] = {
 		.name		= "rxe-mw",
 		.size		= sizeof(struct rxe_mw),
-		.elem_offset	= offsetof(struct rxe_mw, pelem),
+		.elem_offset	= offsetof(struct rxe_mw, elem),
 		.cleanup	= rxe_mw_cleanup,
 		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
 		.min_index	= RXE_MIN_MW_INDEX,
@@ -83,7 +83,7 @@ static const struct rxe_type_info {
 	[RXE_TYPE_MC_GRP] = {
 		.name		= "rxe-mc_grp",
 		.size		= sizeof(struct rxe_mc_grp),
-		.elem_offset	= offsetof(struct rxe_mc_grp, pelem),
+		.elem_offset	= offsetof(struct rxe_mc_grp, elem),
 		.cleanup	= rxe_mc_cleanup,
 		.flags		= RXE_POOL_KEY,
 		.key_offset	= offsetof(struct rxe_mc_grp, mgid),
@@ -92,7 +92,7 @@ static const struct rxe_type_info {
 	[RXE_TYPE_MC_ELEM] = {
 		.name		= "rxe-mc_elem",
 		.size		= sizeof(struct rxe_mc_elem),
-		.elem_offset	= offsetof(struct rxe_mc_elem, pelem),
+		.elem_offset	= offsetof(struct rxe_mc_elem, elem),
 	},
 };
 
@@ -189,15 +189,15 @@ static u32 alloc_index(struct rxe_pool *pool)
 	return index + pool->index.min_index;
 }
 
-static int rxe_insert_index(struct rxe_pool *pool, struct rxe_pool_entry *new)
+static int rxe_insert_index(struct rxe_pool *pool, struct rxe_pool_elem *new)
 {
 	struct rb_node **link = &pool->index.tree.rb_node;
 	struct rb_node *parent = NULL;
-	struct rxe_pool_entry *elem;
+	struct rxe_pool_elem *elem;
 
 	while (*link) {
 		parent = *link;
-		elem = rb_entry(parent, struct rxe_pool_entry, index_node);
+		elem = rb_entry(parent, struct rxe_pool_elem, index_node);
 
 		if (elem->index == new->index) {
 			pr_warn("element already exists!\n");
@@ -216,16 +216,16 @@ static int rxe_insert_index(struct rxe_pool *pool, struct rxe_pool_entry *new)
 	return 0;
 }
 
-static int rxe_insert_key(struct rxe_pool *pool, struct rxe_pool_entry *new)
+static int rxe_insert_key(struct rxe_pool *pool, struct rxe_pool_elem *new)
 {
 	struct rb_node **link = &pool->key.tree.rb_node;
 	struct rb_node *parent = NULL;
-	struct rxe_pool_entry *elem;
+	struct rxe_pool_elem *elem;
 	int cmp;
 
 	while (*link) {
 		parent = *link;
-		elem = rb_entry(parent, struct rxe_pool_entry, key_node);
+		elem = rb_entry(parent, struct rxe_pool_elem, key_node);
 
 		cmp = memcmp((u8 *)elem + pool->key.key_offset,
 			     (u8 *)new + pool->key.key_offset, pool->key.key_size);
@@ -247,7 +247,7 @@ static int rxe_insert_key(struct rxe_pool *pool, struct rxe_pool_entry *new)
 	return 0;
 }
 
-int __rxe_add_key_locked(struct rxe_pool_entry *elem, void *key)
+int __rxe_add_key_locked(struct rxe_pool_elem *elem, void *key)
 {
 	struct rxe_pool *pool = elem->pool;
 	int err;
@@ -258,7 +258,7 @@ int __rxe_add_key_locked(struct rxe_pool_entry *elem, void *key)
 	return err;
 }
 
-int __rxe_add_key(struct rxe_pool_entry *elem, void *key)
+int __rxe_add_key(struct rxe_pool_elem *elem, void *key)
 {
 	struct rxe_pool *pool = elem->pool;
 	int err;
@@ -270,14 +270,14 @@ int __rxe_add_key(struct rxe_pool_entry *elem, void *key)
 	return err;
 }
 
-void __rxe_drop_key_locked(struct rxe_pool_entry *elem)
+void __rxe_drop_key_locked(struct rxe_pool_elem *elem)
 {
 	struct rxe_pool *pool = elem->pool;
 
 	rb_erase(&elem->key_node, &pool->key.tree);
 }
 
-void __rxe_drop_key(struct rxe_pool_entry *elem)
+void __rxe_drop_key(struct rxe_pool_elem *elem)
 {
 	struct rxe_pool *pool = elem->pool;
 
@@ -286,7 +286,7 @@ void __rxe_drop_key(struct rxe_pool_entry *elem)
 	write_unlock_bh(&pool->pool_lock);
 }
 
-int __rxe_add_index_locked(struct rxe_pool_entry *elem)
+int __rxe_add_index_locked(struct rxe_pool_elem *elem)
 {
 	struct rxe_pool *pool = elem->pool;
 	int err;
@@ -297,7 +297,7 @@ int __rxe_add_index_locked(struct rxe_pool_entry *elem)
 	return err;
 }
 
-int __rxe_add_index(struct rxe_pool_entry *elem)
+int __rxe_add_index(struct rxe_pool_elem *elem)
 {
 	struct rxe_pool *pool = elem->pool;
 	int err;
@@ -309,7 +309,7 @@ int __rxe_add_index(struct rxe_pool_entry *elem)
 	return err;
 }
 
-void __rxe_drop_index_locked(struct rxe_pool_entry *elem)
+void __rxe_drop_index_locked(struct rxe_pool_elem *elem)
 {
 	struct rxe_pool *pool = elem->pool;
 
@@ -317,7 +317,7 @@ void __rxe_drop_index_locked(struct rxe_pool_entry *elem)
 	rb_erase(&elem->index_node, &pool->index.tree);
 }
 
-void __rxe_drop_index(struct rxe_pool_entry *elem)
+void __rxe_drop_index(struct rxe_pool_elem *elem)
 {
 	struct rxe_pool *pool = elem->pool;
 
@@ -329,7 +329,7 @@ void __rxe_drop_index(struct rxe_pool_entry *elem)
 void *rxe_alloc_locked(struct rxe_pool *pool)
 {
 	const struct rxe_type_info *info = &rxe_type_info[pool->type];
-	struct rxe_pool_entry *elem;
+	struct rxe_pool_elem *elem;
 	u8 *obj;
 
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
@@ -339,7 +339,7 @@ void *rxe_alloc_locked(struct rxe_pool *pool)
 	if (!obj)
 		goto out_cnt;
 
-	elem = (struct rxe_pool_entry *)(obj + info->elem_offset);
+	elem = (struct rxe_pool_elem *)(obj + info->elem_offset);
 
 	elem->pool = pool;
 	kref_init(&elem->ref_cnt);
@@ -354,7 +354,7 @@ void *rxe_alloc_locked(struct rxe_pool *pool)
 void *rxe_alloc(struct rxe_pool *pool)
 {
 	const struct rxe_type_info *info = &rxe_type_info[pool->type];
-	struct rxe_pool_entry *elem;
+	struct rxe_pool_elem *elem;
 	u8 *obj;
 
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
@@ -364,7 +364,7 @@ void *rxe_alloc(struct rxe_pool *pool)
 	if (!obj)
 		goto out_cnt;
 
-	elem = (struct rxe_pool_entry *)(obj + info->elem_offset);
+	elem = (struct rxe_pool_elem *)(obj + info->elem_offset);
 
 	elem->pool = pool;
 	kref_init(&elem->ref_cnt);
@@ -376,7 +376,7 @@ void *rxe_alloc(struct rxe_pool *pool)
 	return NULL;
 }
 
-int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_entry *elem)
+int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 {
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
 		goto out_cnt;
@@ -393,8 +393,8 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_entry *elem)
 
 void rxe_elem_release(struct kref *kref)
 {
-	struct rxe_pool_entry *elem =
-		container_of(kref, struct rxe_pool_entry, ref_cnt);
+	struct rxe_pool_elem *elem =
+		container_of(kref, struct rxe_pool_elem, ref_cnt);
 	struct rxe_pool *pool = elem->pool;
 	const struct rxe_type_info *info = &rxe_type_info[pool->type];
 	u8 *obj;
@@ -414,13 +414,13 @@ void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index)
 {
 	const struct rxe_type_info *info = &rxe_type_info[pool->type];
 	struct rb_node *node;
-	struct rxe_pool_entry *elem;
+	struct rxe_pool_elem *elem;
 	u8 *obj;
 
 	node = pool->index.tree.rb_node;
 
 	while (node) {
-		elem = rb_entry(node, struct rxe_pool_entry, index_node);
+		elem = rb_entry(node, struct rxe_pool_elem, index_node);
 
 		if (elem->index > index)
 			node = node->rb_left;
@@ -455,14 +455,14 @@ void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key)
 {
 	const struct rxe_type_info *info = &rxe_type_info[pool->type];
 	struct rb_node *node;
-	struct rxe_pool_entry *elem;
+	struct rxe_pool_elem *elem;
 	u8 *obj;
 	int cmp;
 
 	node = pool->key.tree.rb_node;
 
 	while (node) {
-		elem = rb_entry(node, struct rxe_pool_entry, key_node);
+		elem = rb_entry(node, struct rxe_pool_elem, key_node);
 
 		cmp = memcmp((u8 *)elem + pool->key.key_offset,
 			     key, pool->key.key_size);
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 8ecd9f870aea..e6508f30bbf8 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -30,9 +30,7 @@ enum rxe_elem_type {
 	RXE_NUM_TYPES,		/* keep me last */
 };
 
-struct rxe_pool_entry;
-
-struct rxe_pool_entry {
+struct rxe_pool_elem {
 	struct rxe_pool		*pool;
 	struct kref		ref_cnt;
 	struct list_head	list;
@@ -49,7 +47,7 @@ struct rxe_pool {
 	struct rxe_dev		*rxe;
 	rwlock_t		pool_lock; /* protects pool add/del/search */
 	size_t			elem_size;
-	void			(*cleanup)(struct rxe_pool_entry *obj);
+	void			(*cleanup)(struct rxe_pool_elem *obj);
 	enum rxe_pool_flags	flags;
 	enum rxe_elem_type	type;
 
@@ -89,51 +87,51 @@ void *rxe_alloc_locked(struct rxe_pool *pool);
 void *rxe_alloc(struct rxe_pool *pool);
 
 /* connect already allocated object to pool */
-int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_entry *elem);
+int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem);
 
-#define rxe_add_to_pool(pool, obj) __rxe_add_to_pool(pool, &(obj)->pelem)
+#define rxe_add_to_pool(pool, obj) __rxe_add_to_pool(pool, &(obj)->elem)
 
 /* assign an index to an indexed object and insert object into
  *  pool's rb tree holding and not holding the pool_lock
  */
-int __rxe_add_index_locked(struct rxe_pool_entry *elem);
+int __rxe_add_index_locked(struct rxe_pool_elem *elem);
 
-#define rxe_add_index_locked(obj) __rxe_add_index_locked(&(obj)->pelem)
+#define rxe_add_index_locked(obj) __rxe_add_index_locked(&(obj)->elem)
 
-int __rxe_add_index(struct rxe_pool_entry *elem);
+int __rxe_add_index(struct rxe_pool_elem *elem);
 
-#define rxe_add_index(obj) __rxe_add_index(&(obj)->pelem)
+#define rxe_add_index(obj) __rxe_add_index(&(obj)->elem)
 
 /* drop an index and remove object from rb tree
  * holding and not holding the pool_lock
  */
-void __rxe_drop_index_locked(struct rxe_pool_entry *elem);
+void __rxe_drop_index_locked(struct rxe_pool_elem *elem);
 
-#define rxe_drop_index_locked(obj) __rxe_drop_index_locked(&(obj)->pelem)
+#define rxe_drop_index_locked(obj) __rxe_drop_index_locked(&(obj)->elem)
 
-void __rxe_drop_index(struct rxe_pool_entry *elem);
+void __rxe_drop_index(struct rxe_pool_elem *elem);
 
-#define rxe_drop_index(obj) __rxe_drop_index(&(obj)->pelem)
+#define rxe_drop_index(obj) __rxe_drop_index(&(obj)->elem)
 
 /* assign a key to a keyed object and insert object into
  * pool's rb tree holding and not holding pool_lock
  */
-int __rxe_add_key_locked(struct rxe_pool_entry *elem, void *key);
+int __rxe_add_key_locked(struct rxe_pool_elem *elem, void *key);
 
-#define rxe_add_key_locked(obj, key) __rxe_add_key_locked(&(obj)->pelem, key)
+#define rxe_add_key_locked(obj, key) __rxe_add_key_locked(&(obj)->elem, key)
 
-int __rxe_add_key(struct rxe_pool_entry *elem, void *key);
+int __rxe_add_key(struct rxe_pool_elem *elem, void *key);
 
-#define rxe_add_key(obj, key) __rxe_add_key(&(obj)->pelem, key)
+#define rxe_add_key(obj, key) __rxe_add_key(&(obj)->elem, key)
 
 /* remove elem from rb tree holding and not holding the pool_lock */
-void __rxe_drop_key_locked(struct rxe_pool_entry *elem);
+void __rxe_drop_key_locked(struct rxe_pool_elem *elem);
 
-#define rxe_drop_key_locked(obj) __rxe_drop_key_locked(&(obj)->pelem)
+#define rxe_drop_key_locked(obj) __rxe_drop_key_locked(&(obj)->elem)
 
-void __rxe_drop_key(struct rxe_pool_entry *elem);
+void __rxe_drop_key(struct rxe_pool_elem *elem);
 
-#define rxe_drop_key(obj) __rxe_drop_key(&(obj)->pelem)
+#define rxe_drop_key(obj) __rxe_drop_key(&(obj)->elem)
 
 /* lookup an indexed object from index holding and not holding the pool_lock.
  * takes a reference on object
@@ -153,9 +151,9 @@ void *rxe_pool_get_key(struct rxe_pool *pool, void *key);
 void rxe_elem_release(struct kref *kref);
 
 /* take a reference on an object */
-#define rxe_add_ref(elem) kref_get(&(elem)->pelem.ref_cnt)
+#define rxe_add_ref(obj) kref_get(&(obj)->elem.ref_cnt)
 
 /* drop a reference on an object */
-#define rxe_drop_ref(elem) kref_put(&(elem)->pelem.ref_cnt, rxe_elem_release)
+#define rxe_drop_ref(obj) kref_put(&(obj)->elem.ref_cnt, rxe_elem_release)
 
 #endif /* RXE_POOL_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 975321812c87..864bb3ef145f 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -167,7 +167,7 @@ static void rxe_qp_init_misc(struct rxe_dev *rxe, struct rxe_qp *qp,
 	qp->attr.path_mtu	= 1;
 	qp->mtu			= ib_mtu_enum_to_int(qp->attr.path_mtu);
 
-	qpn			= qp->pelem.index;
+	qpn			= qp->elem.index;
 	port			= &rxe->port;
 
 	switch (init->qp_type) {
@@ -831,9 +831,9 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
 }
 
 /* called when the last reference to the qp is dropped */
-void rxe_qp_cleanup(struct rxe_pool_entry *arg)
+void rxe_qp_cleanup(struct rxe_pool_elem *elem)
 {
-	struct rxe_qp *qp = container_of(arg, typeof(*qp), pelem);
+	struct rxe_qp *qp = container_of(elem, typeof(*qp), elem);
 
 	execute_in_process_context(rxe_qp_do_cleanup, &qp->cleanup_work);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_srq.c b/drivers/infiniband/sw/rxe/rxe_srq.c
index eb1c4c3b3a78..0c0721f04357 100644
--- a/drivers/infiniband/sw/rxe/rxe_srq.c
+++ b/drivers/infiniband/sw/rxe/rxe_srq.c
@@ -83,7 +83,7 @@ int rxe_srq_from_init(struct rxe_dev *rxe, struct rxe_srq *srq,
 	srq->ibsrq.event_handler	= init->event_handler;
 	srq->ibsrq.srq_context		= init->srq_context;
 	srq->limit		= init->attr.srq_limit;
-	srq->srq_num		= srq->pelem.index;
+	srq->srq_num		= srq->elem.index;
 	srq->rq.max_wr		= init->attr.max_wr;
 	srq->rq.max_sge		= init->attr.max_sge;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index dcb7436b9346..07ca169110bf 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -182,7 +182,7 @@ static int rxe_create_ah(struct ib_ah *ibah,
 
 	/* create index > 0 */
 	rxe_add_index(ah);
-	ah->ah_num = ah->pelem.index;
+	ah->ah_num = ah->elem.index;
 
 	if (uresp) {
 		/* only if new user provider */
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 35e041450090..caf1ce118765 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -35,17 +35,17 @@ static inline int psn_compare(u32 psn_a, u32 psn_b)
 
 struct rxe_ucontext {
 	struct ib_ucontext ibuc;
-	struct rxe_pool_entry	pelem;
+	struct rxe_pool_elem	elem;
 };
 
 struct rxe_pd {
 	struct ib_pd            ibpd;
-	struct rxe_pool_entry	pelem;
+	struct rxe_pool_elem	elem;
 };
 
 struct rxe_ah {
 	struct ib_ah		ibah;
-	struct rxe_pool_entry	pelem;
+	struct rxe_pool_elem	elem;
 	struct rxe_av		av;
 	bool			is_user;
 	int			ah_num;
@@ -60,7 +60,7 @@ struct rxe_cqe {
 
 struct rxe_cq {
 	struct ib_cq		ibcq;
-	struct rxe_pool_entry	pelem;
+	struct rxe_pool_elem	elem;
 	struct rxe_queue	*queue;
 	spinlock_t		cq_lock;
 	u8			notify;
@@ -95,7 +95,7 @@ struct rxe_rq {
 
 struct rxe_srq {
 	struct ib_srq		ibsrq;
-	struct rxe_pool_entry	pelem;
+	struct rxe_pool_elem	elem;
 	struct rxe_pd		*pd;
 	struct rxe_rq		rq;
 	u32			srq_num;
@@ -209,7 +209,7 @@ struct rxe_resp_info {
 
 struct rxe_qp {
 	struct ib_qp		ibqp;
-	struct rxe_pool_entry	pelem;
+	struct rxe_pool_elem	elem;
 	struct ib_qp_attr	attr;
 	unsigned int		valid;
 	unsigned int		mtu;
@@ -309,7 +309,7 @@ static inline int rkey_is_mw(u32 rkey)
 }
 
 struct rxe_mr {
-	struct rxe_pool_entry	pelem;
+	struct rxe_pool_elem	elem;
 	struct ib_mr		ibmr;
 
 	struct ib_umem		*umem;
@@ -342,7 +342,7 @@ enum rxe_mw_state {
 
 struct rxe_mw {
 	struct ib_mw		ibmw;
-	struct rxe_pool_entry	pelem;
+	struct rxe_pool_elem	elem;
 	spinlock_t		lock;
 	enum rxe_mw_state	state;
 	struct rxe_qp		*qp; /* Type 2 only */
@@ -354,7 +354,7 @@ struct rxe_mw {
 };
 
 struct rxe_mc_grp {
-	struct rxe_pool_entry	pelem;
+	struct rxe_pool_elem	elem;
 	spinlock_t		mcg_lock; /* guard group */
 	struct rxe_dev		*rxe;
 	struct list_head	qp_list;
@@ -365,7 +365,7 @@ struct rxe_mc_grp {
 };
 
 struct rxe_mc_elem {
-	struct rxe_pool_entry	pelem;
+	struct rxe_pool_elem	elem;
 	struct list_head	qp_list;
 	struct list_head	grp_list;
 	struct rxe_qp		*qp;
@@ -484,6 +484,6 @@ static inline struct rxe_pd *rxe_mw_pd(struct rxe_mw *mw)
 
 int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name);
 
-void rxe_mc_cleanup(struct rxe_pool_entry *arg);
+void rxe_mc_cleanup(struct rxe_pool_elem *elem);
 
 #endif /* RXE_VERBS_H */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v4 03/13] RDMA/rxe: Copy setup parameters into rxe_pool
  2021-11-03  5:02 [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Bob Pearson
  2021-11-03  5:02 ` [PATCH for-next v4 01/13] RDMA/rxe: Replace irqsave locks with bh locks Bob Pearson
  2021-11-03  5:02 ` [PATCH for-next v4 02/13] RDMA/rxe: Cleanup rxe_pool_entry Bob Pearson
@ 2021-11-03  5:02 ` Bob Pearson
  2021-11-03  5:02 ` [PATCH for-next v4 04/13] RDMA/rxe: Save object pointer in pool element Bob Pearson
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Bob Pearson @ 2021-11-03  5:02 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

In rxe_pool.c copy remaining pool setup parameters from rxe_pool_info into
rxe_pool. This saves looking up rxe_pool_info in the performance path.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 56 ++++++++++++----------------
 drivers/infiniband/sw/rxe/rxe_pool.h |  4 +-
 2 files changed, 27 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 4b4bf0e03ddd..50a92ec1a0bc 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -96,11 +96,6 @@ static const struct rxe_type_info {
 	},
 };
 
-static inline const char *pool_name(struct rxe_pool *pool)
-{
-	return rxe_type_info[pool->type].name;
-}
-
 static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min)
 {
 	int err = 0;
@@ -130,35 +125,36 @@ int rxe_pool_init(
 	enum rxe_elem_type	type,
 	unsigned int		max_elem)
 {
+	const struct rxe_type_info *info = &rxe_type_info[type];
 	int			err = 0;
-	size_t			size = rxe_type_info[type].size;
 
 	memset(pool, 0, sizeof(*pool));
 
 	pool->rxe		= rxe;
+	pool->name		= info->name;
 	pool->type		= type;
 	pool->max_elem		= max_elem;
-	pool->elem_size		= ALIGN(size, RXE_POOL_ALIGN);
-	pool->flags		= rxe_type_info[type].flags;
-	pool->index.tree	= RB_ROOT;
-	pool->key.tree		= RB_ROOT;
-	pool->cleanup		= rxe_type_info[type].cleanup;
+	pool->elem_size		= ALIGN(info->size, RXE_POOL_ALIGN);
+	pool->elem_offset	= info->elem_offset;
+	pool->flags		= info->flags;
+	pool->cleanup		= info->cleanup;
 
 	atomic_set(&pool->num_elem, 0);
 
 	rwlock_init(&pool->pool_lock);
 
-	if (rxe_type_info[type].flags & RXE_POOL_INDEX) {
-		err = rxe_pool_init_index(pool,
-					  rxe_type_info[type].max_index,
-					  rxe_type_info[type].min_index);
+	if (pool->flags & RXE_POOL_INDEX) {
+		pool->index.tree = RB_ROOT;
+		err = rxe_pool_init_index(pool, info->max_index,
+					  info->min_index);
 		if (err)
 			goto out;
 	}
 
-	if (rxe_type_info[type].flags & RXE_POOL_KEY) {
-		pool->key.key_offset = rxe_type_info[type].key_offset;
-		pool->key.key_size = rxe_type_info[type].key_size;
+	if (pool->flags & RXE_POOL_KEY) {
+		pool->key.tree = RB_ROOT;
+		pool->key.key_offset = info->key_offset;
+		pool->key.key_size = info->key_size;
 	}
 
 out:
@@ -169,9 +165,10 @@ void rxe_pool_cleanup(struct rxe_pool *pool)
 {
 	if (atomic_read(&pool->num_elem) > 0)
 		pr_warn("%s pool destroyed with unfree'd elem\n",
-			pool_name(pool));
+			pool->name);
 
-	bitmap_free(pool->index.table);
+	if (pool->flags & RXE_POOL_INDEX)
+		bitmap_free(pool->index.table);
 }
 
 static u32 alloc_index(struct rxe_pool *pool)
@@ -328,18 +325,17 @@ void __rxe_drop_index(struct rxe_pool_elem *elem)
 
 void *rxe_alloc_locked(struct rxe_pool *pool)
 {
-	const struct rxe_type_info *info = &rxe_type_info[pool->type];
 	struct rxe_pool_elem *elem;
 	u8 *obj;
 
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
 		goto out_cnt;
 
-	obj = kzalloc(info->size, GFP_ATOMIC);
+	obj = kzalloc(pool->elem_size, GFP_ATOMIC);
 	if (!obj)
 		goto out_cnt;
 
-	elem = (struct rxe_pool_elem *)(obj + info->elem_offset);
+	elem = (struct rxe_pool_elem *)(obj + pool->elem_offset);
 
 	elem->pool = pool;
 	kref_init(&elem->ref_cnt);
@@ -353,18 +349,17 @@ void *rxe_alloc_locked(struct rxe_pool *pool)
 
 void *rxe_alloc(struct rxe_pool *pool)
 {
-	const struct rxe_type_info *info = &rxe_type_info[pool->type];
 	struct rxe_pool_elem *elem;
 	u8 *obj;
 
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
 		goto out_cnt;
 
-	obj = kzalloc(info->size, GFP_KERNEL);
+	obj = kzalloc(pool->elem_size, GFP_KERNEL);
 	if (!obj)
 		goto out_cnt;
 
-	elem = (struct rxe_pool_elem *)(obj + info->elem_offset);
+	elem = (struct rxe_pool_elem *)(obj + pool->elem_offset);
 
 	elem->pool = pool;
 	kref_init(&elem->ref_cnt);
@@ -396,14 +391,13 @@ void rxe_elem_release(struct kref *kref)
 	struct rxe_pool_elem *elem =
 		container_of(kref, struct rxe_pool_elem, ref_cnt);
 	struct rxe_pool *pool = elem->pool;
-	const struct rxe_type_info *info = &rxe_type_info[pool->type];
 	u8 *obj;
 
 	if (pool->cleanup)
 		pool->cleanup(elem);
 
 	if (!(pool->flags & RXE_POOL_NO_ALLOC)) {
-		obj = (u8 *)elem - info->elem_offset;
+		obj = (u8 *)elem - pool->elem_offset;
 		kfree(obj);
 	}
 
@@ -412,7 +406,6 @@ void rxe_elem_release(struct kref *kref)
 
 void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index)
 {
-	const struct rxe_type_info *info = &rxe_type_info[pool->type];
 	struct rb_node *node;
 	struct rxe_pool_elem *elem;
 	u8 *obj;
@@ -432,7 +425,7 @@ void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index)
 
 	if (node) {
 		kref_get(&elem->ref_cnt);
-		obj = (u8 *)elem - info->elem_offset;
+		obj = (u8 *)elem - pool->elem_offset;
 	} else {
 		obj = NULL;
 	}
@@ -453,7 +446,6 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 
 void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key)
 {
-	const struct rxe_type_info *info = &rxe_type_info[pool->type];
 	struct rb_node *node;
 	struct rxe_pool_elem *elem;
 	u8 *obj;
@@ -477,7 +469,7 @@ void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key)
 
 	if (node) {
 		kref_get(&elem->ref_cnt);
-		obj = (u8 *)elem - info->elem_offset;
+		obj = (u8 *)elem - pool->elem_offset;
 	} else {
 		obj = NULL;
 	}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index e6508f30bbf8..591e1c0ad438 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -45,14 +45,16 @@ struct rxe_pool_elem {
 
 struct rxe_pool {
 	struct rxe_dev		*rxe;
+	const char		*name;
 	rwlock_t		pool_lock; /* protects pool add/del/search */
-	size_t			elem_size;
 	void			(*cleanup)(struct rxe_pool_elem *obj);
 	enum rxe_pool_flags	flags;
 	enum rxe_elem_type	type;
 
 	unsigned int		max_elem;
 	atomic_t		num_elem;
+	size_t			elem_size;
+	size_t			elem_offset;
 
 	/* only used if indexed */
 	struct {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v4 04/13] RDMA/rxe: Save object pointer in pool element
  2021-11-03  5:02 [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Bob Pearson
                   ` (2 preceding siblings ...)
  2021-11-03  5:02 ` [PATCH for-next v4 03/13] RDMA/rxe: Copy setup parameters into rxe_pool Bob Pearson
@ 2021-11-03  5:02 ` Bob Pearson
  2021-11-03  5:02 ` [PATCH for-next v4 05/13] RDMA/rxe: Replace RB tree by xarray for indexes Bob Pearson
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Bob Pearson @ 2021-11-03  5:02 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

In rxe_pool.c currently there are many cases where it is necessary to
compute the offset from a pool element struct to the object containing
it in a type independent way where the offset is different for each type.
By saving a pointer to the object when they are created extra work can be
saved.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 30 ++++++++++++++++------------
 drivers/infiniband/sw/rxe/rxe_pool.h |  1 +
 2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 50a92ec1a0bc..276101016848 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -225,7 +225,8 @@ static int rxe_insert_key(struct rxe_pool *pool, struct rxe_pool_elem *new)
 		elem = rb_entry(parent, struct rxe_pool_elem, key_node);
 
 		cmp = memcmp((u8 *)elem + pool->key.key_offset,
-			     (u8 *)new + pool->key.key_offset, pool->key.key_size);
+			     (u8 *)new + pool->key.key_offset,
+			     pool->key.key_size);
 
 		if (cmp == 0) {
 			pr_warn("key already exists!\n");
@@ -326,7 +327,7 @@ void __rxe_drop_index(struct rxe_pool_elem *elem)
 void *rxe_alloc_locked(struct rxe_pool *pool)
 {
 	struct rxe_pool_elem *elem;
-	u8 *obj;
+	void *obj;
 
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
 		goto out_cnt;
@@ -335,9 +336,10 @@ void *rxe_alloc_locked(struct rxe_pool *pool)
 	if (!obj)
 		goto out_cnt;
 
-	elem = (struct rxe_pool_elem *)(obj + pool->elem_offset);
+	elem = (struct rxe_pool_elem *)((u8 *)obj + pool->elem_offset);
 
 	elem->pool = pool;
+	elem->obj = obj;
 	kref_init(&elem->ref_cnt);
 
 	return obj;
@@ -350,7 +352,7 @@ void *rxe_alloc_locked(struct rxe_pool *pool)
 void *rxe_alloc(struct rxe_pool *pool)
 {
 	struct rxe_pool_elem *elem;
-	u8 *obj;
+	void *obj;
 
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
 		goto out_cnt;
@@ -359,9 +361,10 @@ void *rxe_alloc(struct rxe_pool *pool)
 	if (!obj)
 		goto out_cnt;
 
-	elem = (struct rxe_pool_elem *)(obj + pool->elem_offset);
+	elem = (struct rxe_pool_elem *)((u8 *)obj + pool->elem_offset);
 
 	elem->pool = pool;
+	elem->obj = obj;
 	kref_init(&elem->ref_cnt);
 
 	return obj;
@@ -377,6 +380,7 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 		goto out_cnt;
 
 	elem->pool = pool;
+	elem->obj = (u8 *)elem - pool->elem_offset;
 	kref_init(&elem->ref_cnt);
 
 	return 0;
@@ -391,13 +395,13 @@ void rxe_elem_release(struct kref *kref)
 	struct rxe_pool_elem *elem =
 		container_of(kref, struct rxe_pool_elem, ref_cnt);
 	struct rxe_pool *pool = elem->pool;
-	u8 *obj;
+	void *obj;
 
 	if (pool->cleanup)
 		pool->cleanup(elem);
 
 	if (!(pool->flags & RXE_POOL_NO_ALLOC)) {
-		obj = (u8 *)elem - pool->elem_offset;
+		obj = elem->obj;
 		kfree(obj);
 	}
 
@@ -408,7 +412,7 @@ void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index)
 {
 	struct rb_node *node;
 	struct rxe_pool_elem *elem;
-	u8 *obj;
+	void *obj;
 
 	node = pool->index.tree.rb_node;
 
@@ -425,7 +429,7 @@ void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index)
 
 	if (node) {
 		kref_get(&elem->ref_cnt);
-		obj = (u8 *)elem - pool->elem_offset;
+		obj = elem->obj;
 	} else {
 		obj = NULL;
 	}
@@ -435,7 +439,7 @@ void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index)
 
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 {
-	u8 *obj;
+	void *obj;
 
 	read_lock_bh(&pool->pool_lock);
 	obj = rxe_pool_get_index_locked(pool, index);
@@ -448,7 +452,7 @@ void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key)
 {
 	struct rb_node *node;
 	struct rxe_pool_elem *elem;
-	u8 *obj;
+	void *obj;
 	int cmp;
 
 	node = pool->key.tree.rb_node;
@@ -469,7 +473,7 @@ void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key)
 
 	if (node) {
 		kref_get(&elem->ref_cnt);
-		obj = (u8 *)elem - pool->elem_offset;
+		obj = elem->obj;
 	} else {
 		obj = NULL;
 	}
@@ -479,7 +483,7 @@ void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key)
 
 void *rxe_pool_get_key(struct rxe_pool *pool, void *key)
 {
-	u8 *obj;
+	void *obj;
 
 	read_lock_bh(&pool->pool_lock);
 	obj = rxe_pool_get_key_locked(pool, key);
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 591e1c0ad438..c9fa8429fcf4 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -32,6 +32,7 @@ enum rxe_elem_type {
 
 struct rxe_pool_elem {
 	struct rxe_pool		*pool;
+	void			*obj;
 	struct kref		ref_cnt;
 	struct list_head	list;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v4 05/13] RDMA/rxe: Replace RB tree by xarray for indexes
  2021-11-03  5:02 [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Bob Pearson
                   ` (3 preceding siblings ...)
  2021-11-03  5:02 ` [PATCH for-next v4 04/13] RDMA/rxe: Save object pointer in pool element Bob Pearson
@ 2021-11-03  5:02 ` Bob Pearson
  2021-11-19 17:38   ` Jason Gunthorpe
  2021-11-03  5:02 ` [PATCH for-next v4 06/13] RDMA/rxe: Remove #include "rxe_loc.h" from rxe_pool.c Bob Pearson
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Bob Pearson @ 2021-11-03  5:02 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently the rxe driver uses red-black trees to add indices and keys
to the rxe object pool. Linux xarrays provide a better way to implement
the same functionality for indices but not keys. This patch replaces
edd-black trees by xarrays for indexed objects. Since caller managed
locks for indexed objects are not used these APIs were deleted as well.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c       | 100 ++------------
 drivers/infiniband/sw/rxe/rxe_mr.c    |   1 -
 drivers/infiniband/sw/rxe/rxe_mw.c    |   4 -
 drivers/infiniband/sw/rxe/rxe_pool.c  | 189 ++++++--------------------
 drivers/infiniband/sw/rxe/rxe_pool.h  |  44 +-----
 drivers/infiniband/sw/rxe/rxe_verbs.c |  12 --
 6 files changed, 62 insertions(+), 288 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 8e0f9c489cab..09c73a0d8513 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -116,97 +116,31 @@ static void rxe_init_ports(struct rxe_dev *rxe)
 }
 
 /* init pools of managed objects */
-static int rxe_init_pools(struct rxe_dev *rxe)
+static void rxe_init_pools(struct rxe_dev *rxe)
 {
-	int err;
-
-	err = rxe_pool_init(rxe, &rxe->uc_pool, RXE_TYPE_UC,
-			    rxe->max_ucontext);
-	if (err)
-		goto err1;
-
-	err = rxe_pool_init(rxe, &rxe->pd_pool, RXE_TYPE_PD,
-			    rxe->attr.max_pd);
-	if (err)
-		goto err2;
-
-	err = rxe_pool_init(rxe, &rxe->ah_pool, RXE_TYPE_AH,
-			    rxe->attr.max_ah);
-	if (err)
-		goto err3;
-
-	err = rxe_pool_init(rxe, &rxe->srq_pool, RXE_TYPE_SRQ,
-			    rxe->attr.max_srq);
-	if (err)
-		goto err4;
-
-	err = rxe_pool_init(rxe, &rxe->qp_pool, RXE_TYPE_QP,
-			    rxe->attr.max_qp);
-	if (err)
-		goto err5;
-
-	err = rxe_pool_init(rxe, &rxe->cq_pool, RXE_TYPE_CQ,
-			    rxe->attr.max_cq);
-	if (err)
-		goto err6;
-
-	err = rxe_pool_init(rxe, &rxe->mr_pool, RXE_TYPE_MR,
-			    rxe->attr.max_mr);
-	if (err)
-		goto err7;
-
-	err = rxe_pool_init(rxe, &rxe->mw_pool, RXE_TYPE_MW,
-			    rxe->attr.max_mw);
-	if (err)
-		goto err8;
-
-	err = rxe_pool_init(rxe, &rxe->mc_grp_pool, RXE_TYPE_MC_GRP,
+	rxe_pool_init(rxe, &rxe->uc_pool, RXE_TYPE_UC, rxe->max_ucontext);
+	rxe_pool_init(rxe, &rxe->pd_pool, RXE_TYPE_PD, rxe->attr.max_pd);
+	rxe_pool_init(rxe, &rxe->ah_pool, RXE_TYPE_AH, rxe->attr.max_ah);
+	rxe_pool_init(rxe, &rxe->srq_pool, RXE_TYPE_SRQ, rxe->attr.max_srq);
+	rxe_pool_init(rxe, &rxe->qp_pool, RXE_TYPE_QP, rxe->attr.max_qp);
+	rxe_pool_init(rxe, &rxe->cq_pool, RXE_TYPE_CQ, rxe->attr.max_cq);
+	rxe_pool_init(rxe, &rxe->mr_pool, RXE_TYPE_MR, rxe->attr.max_mr);
+	rxe_pool_init(rxe, &rxe->mw_pool, RXE_TYPE_MW, rxe->attr.max_mw);
+	rxe_pool_init(rxe, &rxe->mc_grp_pool, RXE_TYPE_MC_GRP,
 			    rxe->attr.max_mcast_grp);
-	if (err)
-		goto err9;
-
-	err = rxe_pool_init(rxe, &rxe->mc_elem_pool, RXE_TYPE_MC_ELEM,
+	rxe_pool_init(rxe, &rxe->mc_elem_pool, RXE_TYPE_MC_ELEM,
 			    rxe->attr.max_total_mcast_qp_attach);
-	if (err)
-		goto err10;
-
-	return 0;
-
-err10:
-	rxe_pool_cleanup(&rxe->mc_grp_pool);
-err9:
-	rxe_pool_cleanup(&rxe->mw_pool);
-err8:
-	rxe_pool_cleanup(&rxe->mr_pool);
-err7:
-	rxe_pool_cleanup(&rxe->cq_pool);
-err6:
-	rxe_pool_cleanup(&rxe->qp_pool);
-err5:
-	rxe_pool_cleanup(&rxe->srq_pool);
-err4:
-	rxe_pool_cleanup(&rxe->ah_pool);
-err3:
-	rxe_pool_cleanup(&rxe->pd_pool);
-err2:
-	rxe_pool_cleanup(&rxe->uc_pool);
-err1:
-	return err;
 }
 
 /* initialize rxe device state */
-static int rxe_init(struct rxe_dev *rxe)
+static void rxe_init(struct rxe_dev *rxe)
 {
-	int err;
-
 	/* init default device parameters */
 	rxe_init_device_param(rxe);
 
 	rxe_init_ports(rxe);
 
-	err = rxe_init_pools(rxe);
-	if (err)
-		return err;
+	rxe_init_pools(rxe);
 
 	/* init pending mmap list */
 	spin_lock_init(&rxe->mmap_offset_lock);
@@ -214,8 +148,6 @@ static int rxe_init(struct rxe_dev *rxe)
 	INIT_LIST_HEAD(&rxe->pending_mmaps);
 
 	mutex_init(&rxe->usdev_lock);
-
-	return 0;
 }
 
 void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu)
@@ -237,11 +169,7 @@ void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu)
  */
 int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name)
 {
-	int err;
-
-	err = rxe_init(rxe);
-	if (err)
-		return err;
+	rxe_init(rxe);
 
 	rxe_set_mtu(rxe, mtu);
 
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 25c78aade822..3c4390adfb80 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -693,7 +693,6 @@ int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 
 	mr->state = RXE_MR_STATE_INVALID;
 	rxe_drop_ref(mr_pd(mr));
-	rxe_drop_index(mr);
 	rxe_drop_ref(mr);
 
 	return 0;
diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c
index 32dd8c0b8b9e..3ae981d77c25 100644
--- a/drivers/infiniband/sw/rxe/rxe_mw.c
+++ b/drivers/infiniband/sw/rxe/rxe_mw.c
@@ -20,7 +20,6 @@ int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata)
 		return ret;
 	}
 
-	rxe_add_index(mw);
 	mw->rkey = ibmw->rkey = (mw->elem.index << 8) | rxe_get_next_key(-1);
 	mw->state = (mw->ibmw.type == IB_MW_TYPE_2) ?
 			RXE_MW_STATE_FREE : RXE_MW_STATE_VALID;
@@ -332,7 +331,4 @@ struct rxe_mw *rxe_lookup_mw(struct rxe_qp *qp, int access, u32 rkey)
 
 void rxe_mw_cleanup(struct rxe_pool_elem *elem)
 {
-	struct rxe_mw *mw = container_of(elem, typeof(*mw), elem);
-
-	rxe_drop_index(mw);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 276101016848..e54433b47365 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -96,37 +96,13 @@ static const struct rxe_type_info {
 	},
 };
 
-static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min)
-{
-	int err = 0;
-
-	if ((max - min + 1) < pool->max_elem) {
-		pr_warn("not enough indices for max_elem\n");
-		err = -EINVAL;
-		goto out;
-	}
-
-	pool->index.max_index = max;
-	pool->index.min_index = min;
-
-	pool->index.table = bitmap_zalloc(max - min + 1, GFP_KERNEL);
-	if (!pool->index.table) {
-		err = -ENOMEM;
-		goto out;
-	}
-
-out:
-	return err;
-}
-
-int rxe_pool_init(
+void rxe_pool_init(
 	struct rxe_dev		*rxe,
 	struct rxe_pool		*pool,
 	enum rxe_elem_type	type,
 	unsigned int		max_elem)
 {
 	const struct rxe_type_info *info = &rxe_type_info[type];
-	int			err = 0;
 
 	memset(pool, 0, sizeof(*pool));
 
@@ -144,11 +120,9 @@ int rxe_pool_init(
 	rwlock_init(&pool->pool_lock);
 
 	if (pool->flags & RXE_POOL_INDEX) {
-		pool->index.tree = RB_ROOT;
-		err = rxe_pool_init_index(pool, info->max_index,
-					  info->min_index);
-		if (err)
-			goto out;
+		xa_init_flags(&pool->xarray.xa, XA_FLAGS_ALLOC);
+		pool->xarray.limit.max = info->max_index;
+		pool->xarray.limit.min = info->min_index;
 	}
 
 	if (pool->flags & RXE_POOL_KEY) {
@@ -156,9 +130,6 @@ int rxe_pool_init(
 		pool->key.key_offset = info->key_offset;
 		pool->key.key_size = info->key_size;
 	}
-
-out:
-	return err;
 }
 
 void rxe_pool_cleanup(struct rxe_pool *pool)
@@ -166,51 +137,6 @@ void rxe_pool_cleanup(struct rxe_pool *pool)
 	if (atomic_read(&pool->num_elem) > 0)
 		pr_warn("%s pool destroyed with unfree'd elem\n",
 			pool->name);
-
-	if (pool->flags & RXE_POOL_INDEX)
-		bitmap_free(pool->index.table);
-}
-
-static u32 alloc_index(struct rxe_pool *pool)
-{
-	u32 index;
-	u32 range = pool->index.max_index - pool->index.min_index + 1;
-
-	index = find_next_zero_bit(pool->index.table, range, pool->index.last);
-	if (index >= range)
-		index = find_first_zero_bit(pool->index.table, range);
-
-	WARN_ON_ONCE(index >= range);
-	set_bit(index, pool->index.table);
-	pool->index.last = index;
-	return index + pool->index.min_index;
-}
-
-static int rxe_insert_index(struct rxe_pool *pool, struct rxe_pool_elem *new)
-{
-	struct rb_node **link = &pool->index.tree.rb_node;
-	struct rb_node *parent = NULL;
-	struct rxe_pool_elem *elem;
-
-	while (*link) {
-		parent = *link;
-		elem = rb_entry(parent, struct rxe_pool_elem, index_node);
-
-		if (elem->index == new->index) {
-			pr_warn("element already exists!\n");
-			return -EINVAL;
-		}
-
-		if (elem->index > new->index)
-			link = &(*link)->rb_left;
-		else
-			link = &(*link)->rb_right;
-	}
-
-	rb_link_node(&new->index_node, parent, link);
-	rb_insert_color(&new->index_node, &pool->index.tree);
-
-	return 0;
 }
 
 static int rxe_insert_key(struct rxe_pool *pool, struct rxe_pool_elem *new)
@@ -284,50 +210,11 @@ void __rxe_drop_key(struct rxe_pool_elem *elem)
 	write_unlock_bh(&pool->pool_lock);
 }
 
-int __rxe_add_index_locked(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-	int err;
-
-	elem->index = alloc_index(pool);
-	err = rxe_insert_index(pool, elem);
-
-	return err;
-}
-
-int __rxe_add_index(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-	int err;
-
-	write_lock_bh(&pool->pool_lock);
-	err = __rxe_add_index_locked(elem);
-	write_unlock_bh(&pool->pool_lock);
-
-	return err;
-}
-
-void __rxe_drop_index_locked(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-
-	clear_bit(elem->index - pool->index.min_index, pool->index.table);
-	rb_erase(&elem->index_node, &pool->index.tree);
-}
-
-void __rxe_drop_index(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-
-	write_lock_bh(&pool->pool_lock);
-	__rxe_drop_index_locked(elem);
-	write_unlock_bh(&pool->pool_lock);
-}
-
 void *rxe_alloc_locked(struct rxe_pool *pool)
 {
 	struct rxe_pool_elem *elem;
 	void *obj;
+	int err;
 
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
 		goto out_cnt;
@@ -342,8 +229,18 @@ void *rxe_alloc_locked(struct rxe_pool *pool)
 	elem->obj = obj;
 	kref_init(&elem->ref_cnt);
 
+	if (pool->flags & RXE_POOL_INDEX) {
+		err = xa_alloc_cyclic_bh(&pool->xarray.xa, &elem->index, elem,
+					 pool->xarray.limit,
+					 &pool->xarray.next, GFP_ATOMIC);
+		if (err)
+			goto out_free;
+	}
+
 	return obj;
 
+out_free:
+	kfree(obj);
 out_cnt:
 	atomic_dec(&pool->num_elem);
 	return NULL;
@@ -353,6 +250,7 @@ void *rxe_alloc(struct rxe_pool *pool)
 {
 	struct rxe_pool_elem *elem;
 	void *obj;
+	int err;
 
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
 		goto out_cnt;
@@ -367,8 +265,18 @@ void *rxe_alloc(struct rxe_pool *pool)
 	elem->obj = obj;
 	kref_init(&elem->ref_cnt);
 
+	if (pool->flags & RXE_POOL_INDEX) {
+		err = xa_alloc_cyclic_bh(&pool->xarray.xa, &elem->index, elem,
+					 pool->xarray.limit,
+					 &pool->xarray.next, GFP_KERNEL);
+		if (err)
+			goto out_free;
+	}
+
 	return obj;
 
+out_free:
+	kfree(obj);
 out_cnt:
 	atomic_dec(&pool->num_elem);
 	return NULL;
@@ -376,6 +284,8 @@ void *rxe_alloc(struct rxe_pool *pool)
 
 int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 {
+	int err = -EINVAL;
+
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
 		goto out_cnt;
 
@@ -383,11 +293,19 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 	elem->obj = (u8 *)elem - pool->elem_offset;
 	kref_init(&elem->ref_cnt);
 
+	if (pool->flags & RXE_POOL_INDEX) {
+		err = xa_alloc_cyclic_bh(&pool->xarray.xa, &elem->index, elem,
+					 pool->xarray.limit,
+					 &pool->xarray.next, GFP_KERNEL);
+		if (err)
+			goto out_cnt;
+	}
+
 	return 0;
 
 out_cnt:
 	atomic_dec(&pool->num_elem);
-	return -EINVAL;
+	return err;
 }
 
 void rxe_elem_release(struct kref *kref)
@@ -397,6 +315,9 @@ void rxe_elem_release(struct kref *kref)
 	struct rxe_pool *pool = elem->pool;
 	void *obj;
 
+	if (pool->flags & RXE_POOL_INDEX)
+		xa_erase(&pool->xarray.xa, elem->index);
+
 	if (pool->cleanup)
 		pool->cleanup(elem);
 
@@ -408,26 +329,13 @@ void rxe_elem_release(struct kref *kref)
 	atomic_dec(&pool->num_elem);
 }
 
-void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index)
+void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 {
-	struct rb_node *node;
 	struct rxe_pool_elem *elem;
 	void *obj;
 
-	node = pool->index.tree.rb_node;
-
-	while (node) {
-		elem = rb_entry(node, struct rxe_pool_elem, index_node);
-
-		if (elem->index > index)
-			node = node->rb_left;
-		else if (elem->index < index)
-			node = node->rb_right;
-		else
-			break;
-	}
-
-	if (node) {
+	elem = xa_load(&pool->xarray.xa, index);
+	if (elem) {
 		kref_get(&elem->ref_cnt);
 		obj = elem->obj;
 	} else {
@@ -437,17 +345,6 @@ void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index)
 	return obj;
 }
 
-void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
-{
-	void *obj;
-
-	read_lock_bh(&pool->pool_lock);
-	obj = rxe_pool_get_index_locked(pool, index);
-	read_unlock_bh(&pool->pool_lock);
-
-	return obj;
-}
-
 void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key)
 {
 	struct rb_node *node;
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index c9fa8429fcf4..163795d633e8 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -40,7 +40,6 @@ struct rxe_pool_elem {
 	struct rb_node		key_node;
 
 	/* only used if indexed */
-	struct rb_node		index_node;
 	u32			index;
 };
 
@@ -59,12 +58,10 @@ struct rxe_pool {
 
 	/* only used if indexed */
 	struct {
-		struct rb_root		tree;
-		unsigned long		*table;
-		u32			last;
-		u32			max_index;
-		u32			min_index;
-	} index;
+		struct xarray		xa;
+		struct xa_limit		limit;
+		u32			next;
+	} xarray;
 
 	/* only used if keyed */
 	struct {
@@ -74,11 +71,7 @@ struct rxe_pool {
 	} key;
 };
 
-/* initialize a pool of objects with given limit on
- * number of elements. gets parameters from rxe_type_info
- * pool elements will be allocated out of a slab cache
- */
-int rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
+void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
 		  enum rxe_elem_type type, u32 max_elem);
 
 /* free resources from object pool */
@@ -94,28 +87,6 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem);
 
 #define rxe_add_to_pool(pool, obj) __rxe_add_to_pool(pool, &(obj)->elem)
 
-/* assign an index to an indexed object and insert object into
- *  pool's rb tree holding and not holding the pool_lock
- */
-int __rxe_add_index_locked(struct rxe_pool_elem *elem);
-
-#define rxe_add_index_locked(obj) __rxe_add_index_locked(&(obj)->elem)
-
-int __rxe_add_index(struct rxe_pool_elem *elem);
-
-#define rxe_add_index(obj) __rxe_add_index(&(obj)->elem)
-
-/* drop an index and remove object from rb tree
- * holding and not holding the pool_lock
- */
-void __rxe_drop_index_locked(struct rxe_pool_elem *elem);
-
-#define rxe_drop_index_locked(obj) __rxe_drop_index_locked(&(obj)->elem)
-
-void __rxe_drop_index(struct rxe_pool_elem *elem);
-
-#define rxe_drop_index(obj) __rxe_drop_index(&(obj)->elem)
-
 /* assign a key to a keyed object and insert object into
  * pool's rb tree holding and not holding pool_lock
  */
@@ -136,11 +107,6 @@ void __rxe_drop_key(struct rxe_pool_elem *elem);
 
 #define rxe_drop_key(obj) __rxe_drop_key(&(obj)->elem)
 
-/* lookup an indexed object from index holding and not holding the pool_lock.
- * takes a reference on object
- */
-void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index);
-
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index);
 
 /* lookup keyed object from key holding and not holding the pool_lock.
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 07ca169110bf..e3f64eae088c 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -181,7 +181,6 @@ static int rxe_create_ah(struct ib_ah *ibah,
 		return err;
 
 	/* create index > 0 */
-	rxe_add_index(ah);
 	ah->ah_num = ah->elem.index;
 
 	if (uresp) {
@@ -189,7 +188,6 @@ static int rxe_create_ah(struct ib_ah *ibah,
 		err = copy_to_user(&uresp->ah_num, &ah->ah_num,
 					 sizeof(uresp->ah_num));
 		if (err) {
-			rxe_drop_index(ah);
 			rxe_drop_ref(ah);
 			return -EFAULT;
 		}
@@ -230,7 +228,6 @@ static int rxe_destroy_ah(struct ib_ah *ibah, u32 flags)
 {
 	struct rxe_ah *ah = to_rah(ibah);
 
-	rxe_drop_index(ah);
 	rxe_drop_ref(ah);
 	return 0;
 }
@@ -437,7 +434,6 @@ static int rxe_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init,
 	if (err)
 		return err;
 
-	rxe_add_index(qp);
 	err = rxe_qp_from_init(rxe, qp, pd, init, uresp, ibqp->pd, udata);
 	if (err)
 		goto qp_init;
@@ -445,7 +441,6 @@ static int rxe_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init,
 	return 0;
 
 qp_init:
-	rxe_drop_index(qp);
 	rxe_drop_ref(qp);
 	return err;
 }
@@ -490,7 +485,6 @@ static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 	struct rxe_qp *qp = to_rqp(ibqp);
 
 	rxe_qp_destroy(qp);
-	rxe_drop_index(qp);
 	rxe_drop_ref(qp);
 	return 0;
 }
@@ -893,7 +887,6 @@ static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access)
 	if (!mr)
 		return ERR_PTR(-ENOMEM);
 
-	rxe_add_index(mr);
 	rxe_add_ref(pd);
 	rxe_mr_init_dma(pd, access, mr);
 
@@ -917,7 +910,6 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 		goto err2;
 	}
 
-	rxe_add_index(mr);
 
 	rxe_add_ref(pd);
 
@@ -929,7 +921,6 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 
 err3:
 	rxe_drop_ref(pd);
-	rxe_drop_index(mr);
 	rxe_drop_ref(mr);
 err2:
 	return ERR_PTR(err);
@@ -952,8 +943,6 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
 		goto err1;
 	}
 
-	rxe_add_index(mr);
-
 	rxe_add_ref(pd);
 
 	err = rxe_mr_init_fast(pd, max_num_sg, mr);
@@ -964,7 +953,6 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
 
 err2:
 	rxe_drop_ref(pd);
-	rxe_drop_index(mr);
 	rxe_drop_ref(mr);
 err1:
 	return ERR_PTR(err);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v4 06/13] RDMA/rxe: Remove #include "rxe_loc.h" from rxe_pool.c
  2021-11-03  5:02 [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Bob Pearson
                   ` (4 preceding siblings ...)
  2021-11-03  5:02 ` [PATCH for-next v4 05/13] RDMA/rxe: Replace RB tree by xarray for indexes Bob Pearson
@ 2021-11-03  5:02 ` Bob Pearson
  2021-11-03  5:02 ` [PATCH v4 for-next 07/13] RDMA/rxe: Remove some #defines from rxe_pool.h Bob Pearson
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Bob Pearson @ 2021-11-03  5:02 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

rxe_loc.h is already included in rxe.h so do not include it in rxe_pool.c

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index e54433b47365..ff3979807872 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -5,7 +5,6 @@
  */
 
 #include "rxe.h"
-#include "rxe_loc.h"
 
 static const struct rxe_type_info {
 	const char *name;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 for-next 07/13] RDMA/rxe: Remove some #defines from rxe_pool.h
  2021-11-03  5:02 [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Bob Pearson
                   ` (5 preceding siblings ...)
  2021-11-03  5:02 ` [PATCH for-next v4 06/13] RDMA/rxe: Remove #include "rxe_loc.h" from rxe_pool.c Bob Pearson
@ 2021-11-03  5:02 ` Bob Pearson
  2021-11-03  5:02 ` [PATCH v4 for-next 08/13] RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC Bob Pearson
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Bob Pearson @ 2021-11-03  5:02 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

RXE_POOL_ALIGN is only used in rxe_pool.c so move RXE_POOL_ALIGN
to rxe_pool.c from rxe_pool.h.
RXE_POOL_CACHE_FLAGS is never used so it is deleted from rxe_pool.h

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 2 ++
 drivers/infiniband/sw/rxe/rxe_pool.h | 3 ---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index ff3979807872..05d56becc457 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -6,6 +6,8 @@
 
 #include "rxe.h"
 
+#define RXE_POOL_ALIGN		(16)
+
 static const struct rxe_type_info {
 	const char *name;
 	size_t size;
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 163795d633e8..64e514189ee0 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -7,9 +7,6 @@
 #ifndef RXE_POOL_H
 #define RXE_POOL_H
 
-#define RXE_POOL_ALIGN		(16)
-#define RXE_POOL_CACHE_FLAGS	(0)
-
 enum rxe_pool_flags {
 	RXE_POOL_INDEX		= BIT(1),
 	RXE_POOL_KEY		= BIT(2),
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 for-next 08/13] RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
  2021-11-03  5:02 [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Bob Pearson
                   ` (6 preceding siblings ...)
  2021-11-03  5:02 ` [PATCH v4 for-next 07/13] RDMA/rxe: Remove some #defines from rxe_pool.h Bob Pearson
@ 2021-11-03  5:02 ` Bob Pearson
  2021-11-03  5:02 ` [PATCH for-next v4 09/13] RDMA/rxe: Replaced keyed rxe objects by indexed objects Bob Pearson
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Bob Pearson @ 2021-11-03  5:02 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Since most rxe objects are now allocated in rdma-core change the
sense of RXE_POOL_NO_ALLOC to RXE_POOL_ALLOC. This makes the code
easier to understand.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 18 ++++++++----------
 drivers/infiniband/sw/rxe/rxe_pool.h |  2 +-
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 05d56becc457..6fa524efb6af 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -23,19 +23,17 @@ static const struct rxe_type_info {
 		.name		= "rxe-uc",
 		.size		= sizeof(struct rxe_ucontext),
 		.elem_offset	= offsetof(struct rxe_ucontext, elem),
-		.flags          = RXE_POOL_NO_ALLOC,
 	},
 	[RXE_TYPE_PD] = {
 		.name		= "rxe-pd",
 		.size		= sizeof(struct rxe_pd),
 		.elem_offset	= offsetof(struct rxe_pd, elem),
-		.flags		= RXE_POOL_NO_ALLOC,
 	},
 	[RXE_TYPE_AH] = {
 		.name		= "rxe-ah",
 		.size		= sizeof(struct rxe_ah),
 		.elem_offset	= offsetof(struct rxe_ah, elem),
-		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
+		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_AH_INDEX,
 		.max_index	= RXE_MAX_AH_INDEX,
 	},
@@ -43,7 +41,7 @@ static const struct rxe_type_info {
 		.name		= "rxe-srq",
 		.size		= sizeof(struct rxe_srq),
 		.elem_offset	= offsetof(struct rxe_srq, elem),
-		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
+		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_SRQ_INDEX,
 		.max_index	= RXE_MAX_SRQ_INDEX,
 	},
@@ -52,7 +50,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_qp),
 		.elem_offset	= offsetof(struct rxe_qp, elem),
 		.cleanup	= rxe_qp_cleanup,
-		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
+		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_QP_INDEX,
 		.max_index	= RXE_MAX_QP_INDEX,
 	},
@@ -60,7 +58,6 @@ static const struct rxe_type_info {
 		.name		= "rxe-cq",
 		.size		= sizeof(struct rxe_cq),
 		.elem_offset	= offsetof(struct rxe_cq, elem),
-		.flags          = RXE_POOL_NO_ALLOC,
 		.cleanup	= rxe_cq_cleanup,
 	},
 	[RXE_TYPE_MR] = {
@@ -68,7 +65,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_mr),
 		.elem_offset	= offsetof(struct rxe_mr, elem),
 		.cleanup	= rxe_mr_cleanup,
-		.flags		= RXE_POOL_INDEX,
+		.flags		= RXE_POOL_INDEX | RXE_POOL_ALLOC,
 		.min_index	= RXE_MIN_MR_INDEX,
 		.max_index	= RXE_MAX_MR_INDEX,
 	},
@@ -77,7 +74,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_mw),
 		.elem_offset	= offsetof(struct rxe_mw, elem),
 		.cleanup	= rxe_mw_cleanup,
-		.flags		= RXE_POOL_INDEX | RXE_POOL_NO_ALLOC,
+		.flags		= RXE_POOL_INDEX,
 		.min_index	= RXE_MIN_MW_INDEX,
 		.max_index	= RXE_MAX_MW_INDEX,
 	},
@@ -86,7 +83,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_mc_grp),
 		.elem_offset	= offsetof(struct rxe_mc_grp, elem),
 		.cleanup	= rxe_mc_cleanup,
-		.flags		= RXE_POOL_KEY,
+		.flags		= RXE_POOL_KEY | RXE_POOL_ALLOC,
 		.key_offset	= offsetof(struct rxe_mc_grp, mgid),
 		.key_size	= sizeof(union ib_gid),
 	},
@@ -94,6 +91,7 @@ static const struct rxe_type_info {
 		.name		= "rxe-mc_elem",
 		.size		= sizeof(struct rxe_mc_elem),
 		.elem_offset	= offsetof(struct rxe_mc_elem, elem),
+		.flags		= RXE_POOL_ALLOC,
 	},
 };
 
@@ -322,7 +320,7 @@ void rxe_elem_release(struct kref *kref)
 	if (pool->cleanup)
 		pool->cleanup(elem);
 
-	if (!(pool->flags & RXE_POOL_NO_ALLOC)) {
+	if (pool->flags & RXE_POOL_ALLOC) {
 		obj = elem->obj;
 		kfree(obj);
 	}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 64e514189ee0..7299426190c8 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -10,7 +10,7 @@
 enum rxe_pool_flags {
 	RXE_POOL_INDEX		= BIT(1),
 	RXE_POOL_KEY		= BIT(2),
-	RXE_POOL_NO_ALLOC	= BIT(4),
+	RXE_POOL_ALLOC		= BIT(4),
 };
 
 enum rxe_elem_type {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v4 09/13] RDMA/rxe: Replaced keyed rxe objects by indexed objects
  2021-11-03  5:02 [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Bob Pearson
                   ` (7 preceding siblings ...)
  2021-11-03  5:02 ` [PATCH v4 for-next 08/13] RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC Bob Pearson
@ 2021-11-03  5:02 ` Bob Pearson
  2021-11-19 17:41   ` Jason Gunthorpe
  2021-11-03  5:02 ` [PATCH for-next v4 10/13] RDMA/rxe: Prevent taking references to dead objects Bob Pearson
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Bob Pearson @ 2021-11-03  5:02 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Replace the keyed type rxe objects by xarrays with a 64 bit
index. This was only used for mgids. Here construct an index
for each mgid that is very likely to be unique. With this change
there is no longer a requirement for xxx_locked versions of the
pool APIs.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_loc.h   |  10 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c | 162 +++++++++++-----
 drivers/infiniband/sw/rxe/rxe_mw.c    |   2 +-
 drivers/infiniband/sw/rxe/rxe_pool.c  | 256 +++++++++-----------------
 drivers/infiniband/sw/rxe/rxe_pool.h  |  67 +------
 drivers/infiniband/sw/rxe/rxe_recv.c  |   3 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c |  32 ++--
 7 files changed, 230 insertions(+), 302 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index b1e174afb1d4..b33a472eb347 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -40,17 +40,13 @@ void rxe_cq_disable(struct rxe_cq *cq);
 void rxe_cq_cleanup(struct rxe_pool_elem *arg);
 
 /* rxe_mcast.c */
-int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
-		      struct rxe_mc_grp **grp_p);
-
+struct rxe_mc_grp *rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
+				     int alloc);
 int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			   struct rxe_mc_grp *grp);
-
 int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-			    union ib_gid *mgid);
-
+			   struct rxe_mc_grp *grp);
 void rxe_drop_all_mcast_groups(struct rxe_qp *qp);
-
 void rxe_mc_cleanup(struct rxe_pool_elem *arg);
 
 /* rxe_mmap.c */
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index bd1ac88b8700..0fb1a7464a7f 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -7,62 +7,120 @@
 #include "rxe.h"
 #include "rxe_loc.h"
 
-/* caller should hold mc_grp_pool->pool_lock */
-static struct rxe_mc_grp *create_grp(struct rxe_dev *rxe,
-				     struct rxe_pool *pool,
-				     union ib_gid *mgid)
+/**
+ * rxe_mgid_to_index - converts a 128 bit mgid to a 64 bit
+ * index which is hopefully unique.
+ * @mgid: the multicast address
+ *
+ * Returns: an index from the mgid
+ */
+static unsigned long rxe_mgid_to_index(union ib_gid *mgid)
 {
-	int err;
-	struct rxe_mc_grp *grp;
+	__be32 *val32 = (__be32 *)mgid;
+	unsigned long index;
 
-	grp = rxe_alloc_locked(&rxe->mc_grp_pool);
-	if (!grp)
-		return ERR_PTR(-ENOMEM);
-
-	INIT_LIST_HEAD(&grp->qp_list);
-	spin_lock_init(&grp->mcg_lock);
-	grp->rxe = rxe;
-	rxe_add_key_locked(grp, mgid);
-
-	err = rxe_mcast_add(rxe, mgid);
-	if (unlikely(err)) {
-		rxe_drop_key_locked(grp);
-		rxe_drop_ref(grp);
-		return ERR_PTR(err);
+	if (mgid->raw[10] == 0xff && mgid->raw[11] == 0xff) {
+		if ((mgid->raw[12] & 0xf0) != 0xe0)
+			pr_info("mgid is not an ipv4 mc address\n");
+
+		/* mgid is a mapped IPV4 multicast address
+		 * use the 32 bits as an index which will be
+		 * unique
+		 */
+		index = be32_to_cpu(val32[3]);
+	} else {
+		if (mgid->raw[0] != 0xff)
+			pr_info("mgid is not an ipv6 mc address\n");
+
+		/* mgid is an IPV6 multicast address which won't
+		 * fit into the index so construct the index
+		 * from the four 32 bit words in mgid.
+		 * If there is a collision treat it like
+		 * no memory and return NULL
+		 */
+		index = be32_to_cpu(val32[0] ^ val32[1]);
+		index = (index << 32) | be32_to_cpu(val32[2] ^ val32[3]);
 	}
 
-	return grp;
+	return index;
 }
 
-int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
-		      struct rxe_mc_grp **grp_p)
+/**
+ * rxe_mcast_get_grp() - find or create mc_grp from mgid
+ * @rxe: the rdma device
+ * @mgid: the multicast address
+ * @alloc: if 0 just lookup else create a new group if lookup fails
+ *
+ * Returns: on success the mc_grp with a reference held else NULL
+ */
+struct rxe_mc_grp *rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
+				     int alloc)
 {
-	int err;
-	struct rxe_mc_grp *grp;
 	struct rxe_pool *pool = &rxe->mc_grp_pool;
+	struct rxe_mc_grp *grp;
+	struct rxe_mc_grp *old;
+	unsigned long index;
+	int err;
 
 	if (rxe->attr.max_mcast_qp_attach == 0)
-		return -EINVAL;
+		return NULL;
 
-	write_lock_bh(&pool->pool_lock);
+	index = rxe_mgid_to_index(mgid);
+	grp = rxe_pool_get_index(pool, index);
+	if (grp) {
+		if (memcmp(&grp->mgid, mgid, sizeof(*mgid)))
+			goto err_drop_ref;
 
-	grp = rxe_pool_get_key_locked(pool, mgid);
-	if (grp)
-		goto done;
+		return grp;
+	}
+
+	if (!alloc)
+		return NULL;
+
+	grp = rxe_alloc_index(pool, index);
+	if (!grp) {
+		/* either we ran out of memory or someone else just
+		 * inserted a new group at this index most likely
+		 * with the same mgid. If so use that one.
+		 */
+		old = rxe_pool_get_index(pool, index);
+		if (!old)
+			return NULL;
 
-	grp = create_grp(rxe, pool, mgid);
-	if (IS_ERR(grp)) {
-		write_unlock_bh(&pool->pool_lock);
-		err = PTR_ERR(grp);
-		return err;
+		if (memcmp(&old->mgid, mgid, sizeof(*mgid)))
+			goto err_drop_ref;
+
+		return old;
 	}
 
-done:
-	write_unlock_bh(&pool->pool_lock);
-	*grp_p = grp;
-	return 0;
+	memcpy(&grp->mgid, mgid, sizeof(*mgid));
+	INIT_LIST_HEAD(&grp->qp_list);
+	spin_lock_init(&grp->mcg_lock);
+	grp->rxe = rxe;
+
+	err = rxe_mcast_add(rxe, mgid);
+	if (err)
+		goto err_drop_ref;
+
+	return grp;
+
+err_drop_ref:
+	rxe_drop_ref(grp);
+	return NULL;
 }
 
+/**
+ * rxe_mcast_add_grp_elem() - attach a multicast group to a QP
+ * @rxe: the rdma device
+ * @qp: the queue pair
+ * @grp: the mc group
+ *
+ * Allocates a struct rxe_mc_elem which is simultaneously on a
+ * list of QPs attached to grp and on a list of mc groups attached
+ * to QP. Takes a ref on grp until grp is detached.
+ *
+ * Returns: 0 on success or -ENOMEM on failure
+ */
 int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			   struct rxe_mc_grp *grp)
 {
@@ -84,7 +142,7 @@ int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 		goto out;
 	}
 
-	elem = rxe_alloc_locked(&rxe->mc_elem_pool);
+	elem = rxe_alloc(&rxe->mc_elem_pool);
 	if (!elem) {
 		err = -ENOMEM;
 		goto out;
@@ -107,16 +165,22 @@ int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	return err;
 }
 
+/**
+ * rxe_mcast_drop_grp_elem() - detach a multicast group from a QP
+ * @rxe: the rdma device
+ * @qp: the queue pair
+ * @grp: the mc group
+ *
+ * Searches the list of QPs attached to the mc group and then
+ * removes the attachment. Drops the ref on grp and the attachment.
+ *
+ * Returns: 0 on success or -EINVAL on failure if not found
+ */
 int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
-			    union ib_gid *mgid)
+			   struct rxe_mc_grp *grp)
 {
-	struct rxe_mc_grp *grp;
 	struct rxe_mc_elem *elem, *tmp;
 
-	grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
-	if (!grp)
-		goto err1;
-
 	spin_lock_bh(&qp->grp_lock);
 	spin_lock_bh(&grp->mcg_lock);
 
@@ -130,15 +194,12 @@ int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 			spin_unlock_bh(&qp->grp_lock);
 			rxe_drop_ref(elem);
 			rxe_drop_ref(grp);	/* ref held by QP */
-			rxe_drop_ref(grp);	/* ref from get_key */
 			return 0;
 		}
 	}
 
 	spin_unlock_bh(&grp->mcg_lock);
 	spin_unlock_bh(&qp->grp_lock);
-	rxe_drop_ref(grp);			/* ref from get_key */
-err1:
 	return -EINVAL;
 }
 
@@ -163,7 +224,7 @@ void rxe_drop_all_mcast_groups(struct rxe_qp *qp)
 		list_del(&elem->qp_list);
 		grp->num_qp--;
 		spin_unlock_bh(&grp->mcg_lock);
-		rxe_drop_ref(grp);
+		rxe_drop_ref(grp);		/* ref held by QP */
 		rxe_drop_ref(elem);
 	}
 }
@@ -173,6 +234,5 @@ void rxe_mc_cleanup(struct rxe_pool_elem *elem)
 	struct rxe_mc_grp *grp = container_of(elem, typeof(*grp), elem);
 	struct rxe_dev *rxe = grp->rxe;
 
-	rxe_drop_key(grp);
 	rxe_mcast_delete(rxe, &grp->mgid);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c
index 3ae981d77c25..8586361eb7ef 100644
--- a/drivers/infiniband/sw/rxe/rxe_mw.c
+++ b/drivers/infiniband/sw/rxe/rxe_mw.c
@@ -14,7 +14,7 @@ int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata)
 
 	rxe_add_ref(pd);
 
-	ret = rxe_add_to_pool(&rxe->mw_pool, mw);
+	ret = rxe_pool_add(&rxe->mw_pool, mw);
 	if (ret) {
 		rxe_drop_ref(pd);
 		return ret;
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 6fa524efb6af..863fa62da077 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -33,7 +33,7 @@ static const struct rxe_type_info {
 		.name		= "rxe-ah",
 		.size		= sizeof(struct rxe_ah),
 		.elem_offset	= offsetof(struct rxe_ah, elem),
-		.flags		= RXE_POOL_INDEX,
+		.flags		= RXE_POOL_AUTO_INDEX,
 		.min_index	= RXE_MIN_AH_INDEX,
 		.max_index	= RXE_MAX_AH_INDEX,
 	},
@@ -41,7 +41,7 @@ static const struct rxe_type_info {
 		.name		= "rxe-srq",
 		.size		= sizeof(struct rxe_srq),
 		.elem_offset	= offsetof(struct rxe_srq, elem),
-		.flags		= RXE_POOL_INDEX,
+		.flags		= RXE_POOL_AUTO_INDEX,
 		.min_index	= RXE_MIN_SRQ_INDEX,
 		.max_index	= RXE_MAX_SRQ_INDEX,
 	},
@@ -50,7 +50,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_qp),
 		.elem_offset	= offsetof(struct rxe_qp, elem),
 		.cleanup	= rxe_qp_cleanup,
-		.flags		= RXE_POOL_INDEX,
+		.flags		= RXE_POOL_AUTO_INDEX,
 		.min_index	= RXE_MIN_QP_INDEX,
 		.max_index	= RXE_MAX_QP_INDEX,
 	},
@@ -65,7 +65,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_mr),
 		.elem_offset	= offsetof(struct rxe_mr, elem),
 		.cleanup	= rxe_mr_cleanup,
-		.flags		= RXE_POOL_INDEX | RXE_POOL_ALLOC,
+		.flags		= RXE_POOL_AUTO_INDEX | RXE_POOL_ALLOC,
 		.min_index	= RXE_MIN_MR_INDEX,
 		.max_index	= RXE_MAX_MR_INDEX,
 	},
@@ -74,7 +74,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_mw),
 		.elem_offset	= offsetof(struct rxe_mw, elem),
 		.cleanup	= rxe_mw_cleanup,
-		.flags		= RXE_POOL_INDEX,
+		.flags		= RXE_POOL_AUTO_INDEX,
 		.min_index	= RXE_MIN_MW_INDEX,
 		.max_index	= RXE_MAX_MW_INDEX,
 	},
@@ -83,7 +83,7 @@ static const struct rxe_type_info {
 		.size		= sizeof(struct rxe_mc_grp),
 		.elem_offset	= offsetof(struct rxe_mc_grp, elem),
 		.cleanup	= rxe_mc_cleanup,
-		.flags		= RXE_POOL_KEY | RXE_POOL_ALLOC,
+		.flags		= RXE_POOL_EXT_INDEX | RXE_POOL_ALLOC,
 		.key_offset	= offsetof(struct rxe_mc_grp, mgid),
 		.key_size	= sizeof(union ib_gid),
 	},
@@ -118,109 +118,42 @@ void rxe_pool_init(
 
 	rwlock_init(&pool->pool_lock);
 
-	if (pool->flags & RXE_POOL_INDEX) {
+	if (pool->flags & RXE_POOL_AUTO_INDEX) {
 		xa_init_flags(&pool->xarray.xa, XA_FLAGS_ALLOC);
 		pool->xarray.limit.max = info->max_index;
 		pool->xarray.limit.min = info->min_index;
 	}
-
-	if (pool->flags & RXE_POOL_KEY) {
-		pool->key.tree = RB_ROOT;
-		pool->key.key_offset = info->key_offset;
-		pool->key.key_size = info->key_size;
-	}
 }
 
 void rxe_pool_cleanup(struct rxe_pool *pool)
 {
 	if (atomic_read(&pool->num_elem) > 0)
-		pr_warn("%s pool destroyed with unfree'd elem\n",
-			pool->name);
-}
-
-static int rxe_insert_key(struct rxe_pool *pool, struct rxe_pool_elem *new)
-{
-	struct rb_node **link = &pool->key.tree.rb_node;
-	struct rb_node *parent = NULL;
-	struct rxe_pool_elem *elem;
-	int cmp;
-
-	while (*link) {
-		parent = *link;
-		elem = rb_entry(parent, struct rxe_pool_elem, key_node);
+		pr_warn("%s pool destroyed with %d unfree'd elem\n",
+			pool->name, atomic_read(&pool->num_elem));
 
-		cmp = memcmp((u8 *)elem + pool->key.key_offset,
-			     (u8 *)new + pool->key.key_offset,
-			     pool->key.key_size);
-
-		if (cmp == 0) {
-			pr_warn("key already exists!\n");
-			return -EINVAL;
-		}
-
-		if (cmp > 0)
-			link = &(*link)->rb_left;
-		else
-			link = &(*link)->rb_right;
-	}
-
-	rb_link_node(&new->key_node, parent, link);
-	rb_insert_color(&new->key_node, &pool->key.tree);
-
-	return 0;
+	if (pool->flags & (RXE_POOL_AUTO_INDEX | RXE_POOL_EXT_INDEX))
+		xa_destroy(&pool->xarray.xa);
 }
 
-int __rxe_add_key_locked(struct rxe_pool_elem *elem, void *key)
-{
-	struct rxe_pool *pool = elem->pool;
-	int err;
-
-	memcpy((u8 *)elem + pool->key.key_offset, key, pool->key.key_size);
-	err = rxe_insert_key(pool, elem);
-
-	return err;
-}
-
-int __rxe_add_key(struct rxe_pool_elem *elem, void *key)
-{
-	struct rxe_pool *pool = elem->pool;
-	int err;
-
-	write_lock_bh(&pool->pool_lock);
-	err = __rxe_add_key_locked(elem, key);
-	write_unlock_bh(&pool->pool_lock);
-
-	return err;
-}
-
-void __rxe_drop_key_locked(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-
-	rb_erase(&elem->key_node, &pool->key.tree);
-}
-
-void __rxe_drop_key(struct rxe_pool_elem *elem)
-{
-	struct rxe_pool *pool = elem->pool;
-
-	write_lock_bh(&pool->pool_lock);
-	__rxe_drop_key_locked(elem);
-	write_unlock_bh(&pool->pool_lock);
-}
-
-void *rxe_alloc_locked(struct rxe_pool *pool)
+void *rxe_alloc(struct rxe_pool *pool)
 {
 	struct rxe_pool_elem *elem;
 	void *obj;
 	int err;
 
+	if (!(pool->flags & RXE_POOL_ALLOC) ||
+	     (pool->flags & RXE_POOL_EXT_INDEX)) {
+		pr_info("%s called with pool->flags = 0x%x\n",
+				__func__, pool->flags);
+		return NULL;
+	}
+
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
-		goto out_cnt;
+		goto err_count;
 
-	obj = kzalloc(pool->elem_size, GFP_ATOMIC);
+	obj = kzalloc(pool->elem_size, GFP_KERNEL);
 	if (!obj)
-		goto out_cnt;
+		goto err_count;
 
 	elem = (struct rxe_pool_elem *)((u8 *)obj + pool->elem_offset);
 
@@ -228,111 +161,116 @@ void *rxe_alloc_locked(struct rxe_pool *pool)
 	elem->obj = obj;
 	kref_init(&elem->ref_cnt);
 
-	if (pool->flags & RXE_POOL_INDEX) {
-		err = xa_alloc_cyclic_bh(&pool->xarray.xa, &elem->index, elem,
+	if (pool->flags & RXE_POOL_AUTO_INDEX) {
+		u32 index;
+
+		err = xa_alloc_cyclic_bh(&pool->xarray.xa, &index, elem,
 					 pool->xarray.limit,
-					 &pool->xarray.next, GFP_ATOMIC);
+					 &pool->xarray.next, GFP_KERNEL);
 		if (err)
-			goto out_free;
+			goto err_free;
+
+		elem->index = index;
 	}
 
 	return obj;
 
-out_free:
+err_free:
 	kfree(obj);
-out_cnt:
+err_count:
 	atomic_dec(&pool->num_elem);
 	return NULL;
 }
 
-void *rxe_alloc(struct rxe_pool *pool)
+void *rxe_alloc_index(struct rxe_pool *pool, unsigned long index)
 {
 	struct rxe_pool_elem *elem;
 	void *obj;
-	int err;
+	void *old;
+
+	if (!(pool->flags & RXE_POOL_ALLOC) ||
+	    !(pool->flags & RXE_POOL_EXT_INDEX)) {
+		pr_info("%s called with pool->flags = 0x%x\n",
+				__func__, pool->flags);
+		return NULL;
+	}
 
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
-		goto out_cnt;
+		goto err_count;
 
 	obj = kzalloc(pool->elem_size, GFP_KERNEL);
 	if (!obj)
-		goto out_cnt;
+		goto err_count;
 
 	elem = (struct rxe_pool_elem *)((u8 *)obj + pool->elem_offset);
 
 	elem->pool = pool;
 	elem->obj = obj;
+	elem->index = index;
 	kref_init(&elem->ref_cnt);
 
-	if (pool->flags & RXE_POOL_INDEX) {
-		err = xa_alloc_cyclic_bh(&pool->xarray.xa, &elem->index, elem,
-					 pool->xarray.limit,
-					 &pool->xarray.next, GFP_KERNEL);
-		if (err)
-			goto out_free;
-	}
+	old = xa_cmpxchg(&pool->xarray.xa, index, NULL, elem, GFP_KERNEL);
+	if (old)
+		goto err_free;
 
 	return obj;
 
-out_free:
+err_free:
 	kfree(obj);
-out_cnt:
+err_count:
 	atomic_dec(&pool->num_elem);
 	return NULL;
 }
 
-int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
+int __rxe_pool_add(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 {
 	int err = -EINVAL;
 
+	if ((pool->flags & RXE_POOL_ALLOC) ||
+	    (pool->flags & RXE_POOL_EXT_INDEX)) {
+		pr_info("%s called with pool->flags = 0x%x\n",
+				__func__, pool->flags);
+		goto err_out;
+	}
+
 	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
-		goto out_cnt;
+		goto err_count;
 
 	elem->pool = pool;
 	elem->obj = (u8 *)elem - pool->elem_offset;
 	kref_init(&elem->ref_cnt);
 
-	if (pool->flags & RXE_POOL_INDEX) {
-		err = xa_alloc_cyclic_bh(&pool->xarray.xa, &elem->index, elem,
+	if (pool->flags & RXE_POOL_AUTO_INDEX) {
+		u32 index;
+
+		err = xa_alloc_cyclic_bh(&pool->xarray.xa, &index, elem,
 					 pool->xarray.limit,
 					 &pool->xarray.next, GFP_KERNEL);
 		if (err)
-			goto out_cnt;
+			goto err_count;
+
+		elem->index = index;
 	}
 
 	return 0;
 
-out_cnt:
+err_count:
 	atomic_dec(&pool->num_elem);
+err_out:
 	return err;
 }
 
-void rxe_elem_release(struct kref *kref)
+void *rxe_pool_get_index(struct rxe_pool *pool, unsigned long index)
 {
-	struct rxe_pool_elem *elem =
-		container_of(kref, struct rxe_pool_elem, ref_cnt);
-	struct rxe_pool *pool = elem->pool;
+	struct rxe_pool_elem *elem;
 	void *obj;
 
-	if (pool->flags & RXE_POOL_INDEX)
-		xa_erase(&pool->xarray.xa, elem->index);
-
-	if (pool->cleanup)
-		pool->cleanup(elem);
-
-	if (pool->flags & RXE_POOL_ALLOC) {
-		obj = elem->obj;
-		kfree(obj);
+	if (!(pool->flags & (RXE_POOL_AUTO_INDEX | RXE_POOL_EXT_INDEX))) {
+		pr_info("%s called with pool->flags = 0x%x\n",
+				__func__, pool->flags);
+		return NULL;
 	}
 
-	atomic_dec(&pool->num_elem);
-}
-
-void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
-{
-	struct rxe_pool_elem *elem;
-	void *obj;
-
 	elem = xa_load(&pool->xarray.xa, index);
 	if (elem) {
 		kref_get(&elem->ref_cnt);
@@ -344,46 +282,20 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
 	return obj;
 }
 
-void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key)
+void rxe_elem_release(struct kref *kref)
 {
-	struct rb_node *node;
-	struct rxe_pool_elem *elem;
-	void *obj;
-	int cmp;
-
-	node = pool->key.tree.rb_node;
-
-	while (node) {
-		elem = rb_entry(node, struct rxe_pool_elem, key_node);
-
-		cmp = memcmp((u8 *)elem + pool->key.key_offset,
-			     key, pool->key.key_size);
-
-		if (cmp > 0)
-			node = node->rb_left;
-		else if (cmp < 0)
-			node = node->rb_right;
-		else
-			break;
-	}
-
-	if (node) {
-		kref_get(&elem->ref_cnt);
-		obj = elem->obj;
-	} else {
-		obj = NULL;
-	}
+	struct rxe_pool_elem *elem = container_of(kref, struct rxe_pool_elem,
+						  ref_cnt);
+	struct rxe_pool *pool = elem->pool;
 
-	return obj;
-}
+	if (pool->cleanup)
+		pool->cleanup(elem);
 
-void *rxe_pool_get_key(struct rxe_pool *pool, void *key)
-{
-	void *obj;
+	if (pool->flags & (RXE_POOL_AUTO_INDEX | RXE_POOL_EXT_INDEX))
+		xa_erase(&pool->xarray.xa, elem->index);
 
-	read_lock_bh(&pool->pool_lock);
-	obj = rxe_pool_get_key_locked(pool, key);
-	read_unlock_bh(&pool->pool_lock);
+	if (pool->flags & RXE_POOL_ALLOC)
+		kfree(elem->obj);
 
-	return obj;
+	atomic_dec(&pool->num_elem);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 7299426190c8..6cd2366d5407 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -8,9 +8,9 @@
 #define RXE_POOL_H
 
 enum rxe_pool_flags {
-	RXE_POOL_INDEX		= BIT(1),
-	RXE_POOL_KEY		= BIT(2),
-	RXE_POOL_ALLOC		= BIT(4),
+	RXE_POOL_AUTO_INDEX	= BIT(1),
+	RXE_POOL_EXT_INDEX	= BIT(2),
+	RXE_POOL_ALLOC		= BIT(3),
 };
 
 enum rxe_elem_type {
@@ -32,12 +32,7 @@ struct rxe_pool_elem {
 	void			*obj;
 	struct kref		ref_cnt;
 	struct list_head	list;
-
-	/* only used if keyed */
-	struct rb_node		key_node;
-
-	/* only used if indexed */
-	u32			index;
+	unsigned long		index;
 };
 
 struct rxe_pool {
@@ -59,67 +54,25 @@ struct rxe_pool {
 		struct xa_limit		limit;
 		u32			next;
 	} xarray;
-
-	/* only used if keyed */
-	struct {
-		struct rb_root		tree;
-		size_t			key_offset;
-		size_t			key_size;
-	} key;
 };
 
 void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
-		  enum rxe_elem_type type, u32 max_elem);
+		   enum rxe_elem_type type, u32 max_elem);
 
-/* free resources from object pool */
 void rxe_pool_cleanup(struct rxe_pool *pool);
 
-/* allocate an object from pool holding and not holding the pool lock */
-void *rxe_alloc_locked(struct rxe_pool *pool);
-
 void *rxe_alloc(struct rxe_pool *pool);
 
-/* connect already allocated object to pool */
-int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem);
-
-#define rxe_add_to_pool(pool, obj) __rxe_add_to_pool(pool, &(obj)->elem)
-
-/* assign a key to a keyed object and insert object into
- * pool's rb tree holding and not holding pool_lock
- */
-int __rxe_add_key_locked(struct rxe_pool_elem *elem, void *key);
-
-#define rxe_add_key_locked(obj, key) __rxe_add_key_locked(&(obj)->elem, key)
-
-int __rxe_add_key(struct rxe_pool_elem *elem, void *key);
+void *rxe_alloc_index(struct rxe_pool *pool, unsigned long index);
 
-#define rxe_add_key(obj, key) __rxe_add_key(&(obj)->elem, key)
+int __rxe_pool_add(struct rxe_pool *pool, struct rxe_pool_elem *elem);
+#define rxe_pool_add(pool, obj) __rxe_pool_add(pool, &(obj)->elem)
 
-/* remove elem from rb tree holding and not holding the pool_lock */
-void __rxe_drop_key_locked(struct rxe_pool_elem *elem);
+void *rxe_pool_get_index(struct rxe_pool *pool, unsigned long index);
 
-#define rxe_drop_key_locked(obj) __rxe_drop_key_locked(&(obj)->elem)
-
-void __rxe_drop_key(struct rxe_pool_elem *elem);
-
-#define rxe_drop_key(obj) __rxe_drop_key(&(obj)->elem)
-
-void *rxe_pool_get_index(struct rxe_pool *pool, u32 index);
-
-/* lookup keyed object from key holding and not holding the pool_lock.
- * takes a reference on the objecti
- */
-void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key);
-
-void *rxe_pool_get_key(struct rxe_pool *pool, void *key);
-
-/* cleanup an object when all references are dropped */
-void rxe_elem_release(struct kref *kref);
-
-/* take a reference on an object */
 #define rxe_add_ref(obj) kref_get(&(obj)->elem.ref_cnt)
 
-/* drop a reference on an object */
+void rxe_elem_release(struct kref *kref);
 #define rxe_drop_ref(obj) kref_put(&(obj)->elem.ref_cnt, rxe_elem_release)
 
 #endif /* RXE_POOL_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index 6a6cc1fa90e4..780f7902f103 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -245,8 +245,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 	else if (skb->protocol == htons(ETH_P_IPV6))
 		memcpy(&dgid, &ipv6_hdr(skb)->daddr, sizeof(dgid));
 
-	/* lookup mcast group corresponding to mgid, takes a ref */
-	mcg = rxe_pool_get_key(&rxe->mc_grp_pool, &dgid);
+	mcg = rxe_mcast_get_grp(rxe, &dgid, 0);
 	if (!mcg)
 		goto drop;	/* mcast group not registered */
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index e3f64eae088c..7d5b4939ed6d 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -108,7 +108,7 @@ static int rxe_alloc_ucontext(struct ib_ucontext *ibuc, struct ib_udata *udata)
 	struct rxe_dev *rxe = to_rdev(ibuc->device);
 	struct rxe_ucontext *uc = to_ruc(ibuc);
 
-	return rxe_add_to_pool(&rxe->uc_pool, uc);
+	return rxe_pool_add(&rxe->uc_pool, uc);
 }
 
 static void rxe_dealloc_ucontext(struct ib_ucontext *ibuc)
@@ -142,7 +142,7 @@ static int rxe_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
 	struct rxe_dev *rxe = to_rdev(ibpd->device);
 	struct rxe_pd *pd = to_rpd(ibpd);
 
-	return rxe_add_to_pool(&rxe->pd_pool, pd);
+	return rxe_pool_add(&rxe->pd_pool, pd);
 }
 
 static int rxe_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
@@ -176,7 +176,7 @@ static int rxe_create_ah(struct ib_ah *ibah,
 	if (err)
 		return err;
 
-	err = rxe_add_to_pool(&rxe->ah_pool, ah);
+	err = rxe_pool_add(&rxe->ah_pool, ah);
 	if (err)
 		return err;
 
@@ -299,7 +299,7 @@ static int rxe_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init,
 	if (err)
 		goto err1;
 
-	err = rxe_add_to_pool(&rxe->srq_pool, srq);
+	err = rxe_pool_add(&rxe->srq_pool, srq);
 	if (err)
 		goto err1;
 
@@ -430,7 +430,7 @@ static int rxe_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init,
 		qp->is_user = false;
 	}
 
-	err = rxe_add_to_pool(&rxe->qp_pool, qp);
+	err = rxe_pool_add(&rxe->qp_pool, qp);
 	if (err)
 		return err;
 
@@ -787,7 +787,7 @@ static int rxe_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	if (err)
 		return err;
 
-	return rxe_add_to_pool(&rxe->cq_pool, cq);
+	return rxe_pool_add(&rxe->cq_pool, cq);
 }
 
 static int rxe_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
@@ -984,15 +984,14 @@ static int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
 
 static int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 {
-	int err;
 	struct rxe_dev *rxe = to_rdev(ibqp->device);
 	struct rxe_qp *qp = to_rqp(ibqp);
 	struct rxe_mc_grp *grp;
+	int err;
 
-	/* takes a ref on grp if successful */
-	err = rxe_mcast_get_grp(rxe, mgid, &grp);
-	if (err)
-		return err;
+	grp = rxe_mcast_get_grp(rxe, mgid, 1);
+	if (!grp)
+		return -ENOMEM;
 
 	err = rxe_mcast_add_grp_elem(rxe, qp, grp);
 
@@ -1004,8 +1003,17 @@ static int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 {
 	struct rxe_dev *rxe = to_rdev(ibqp->device);
 	struct rxe_qp *qp = to_rqp(ibqp);
+	struct rxe_mc_grp *grp;
+	int err;
+
+	grp = rxe_mcast_get_grp(rxe, mgid, 0);
+	if (!grp)
+		return -EINVAL;
 
-	return rxe_mcast_drop_grp_elem(rxe, qp, mgid);
+	err = rxe_mcast_drop_grp_elem(rxe, qp, grp);
+
+	rxe_drop_ref(grp);
+	return err;
 }
 
 static ssize_t parent_show(struct device *device,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v4 10/13] RDMA/rxe: Prevent taking references to dead objects
  2021-11-03  5:02 [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Bob Pearson
                   ` (8 preceding siblings ...)
  2021-11-03  5:02 ` [PATCH for-next v4 09/13] RDMA/rxe: Replaced keyed rxe objects by indexed objects Bob Pearson
@ 2021-11-03  5:02 ` Bob Pearson
  2021-11-19 17:45   ` Jason Gunthorpe
  2021-11-03  5:02 ` [PATCH for-next v4 11/13] RDMA/rxe: Fix ref error in rxe_av.c Bob Pearson
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Bob Pearson @ 2021-11-03  5:02 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently rxe_add_ref() calls kref_get() which increments the
reference count even if the object has already been released.
Change this to refcount_inc_not_zero() which will only succeed
if the current ref count is larger than zero. This change exposes
some reference counting errors which will be fixed in the following
patches.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 6cd2366d5407..46f2abc359f3 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -7,6 +7,8 @@
 #ifndef RXE_POOL_H
 #define RXE_POOL_H
 
+#include <linux/refcount.h>
+
 enum rxe_pool_flags {
 	RXE_POOL_AUTO_INDEX	= BIT(1),
 	RXE_POOL_EXT_INDEX	= BIT(2),
@@ -70,9 +72,15 @@ int __rxe_pool_add(struct rxe_pool *pool, struct rxe_pool_elem *elem);
 
 void *rxe_pool_get_index(struct rxe_pool *pool, unsigned long index);
 
-#define rxe_add_ref(obj) kref_get(&(obj)->elem.ref_cnt)
+static inline int __rxe_add_ref(struct rxe_pool_elem *elem)
+{
+	return refcount_inc_not_zero(&elem->ref_cnt.refcount);
+}
+
+#define rxe_add_ref(obj) __rxe_add_ref(&(obj)->elem)
 
 void rxe_elem_release(struct kref *kref);
+
 #define rxe_drop_ref(obj) kref_put(&(obj)->elem.ref_cnt, rxe_elem_release)
 
 #endif /* RXE_POOL_H */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v4 11/13] RDMA/rxe: Fix ref error in rxe_av.c
  2021-11-03  5:02 [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Bob Pearson
                   ` (9 preceding siblings ...)
  2021-11-03  5:02 ` [PATCH for-next v4 10/13] RDMA/rxe: Prevent taking references to dead objects Bob Pearson
@ 2021-11-03  5:02 ` Bob Pearson
  2021-11-03  5:02 ` [PATCH for-next v4 12/13] RDMA/rxe: Replace mr by rkey in responder resources Bob Pearson
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Bob Pearson @ 2021-11-03  5:02 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

The commit referenced below can take a reference to the AH which is
never dropped. This only happens in the UD request path. This patch
optionally passes that AH back to the caller so that it can hold the
reference while the AV is being accessed and then drop it. Code to
do this is added to rxe_req.c. The AV is also passed to rxe_prepare
in rxe_net.c as an optimization.

Fixes: e2fe06c90806 ("RDMA/rxe: Lookup kernel AH from ah index in UD WQEs")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_av.c   | 19 +++++++++-
 drivers/infiniband/sw/rxe/rxe_loc.h  |  5 ++-
 drivers/infiniband/sw/rxe/rxe_net.c  | 17 +++++----
 drivers/infiniband/sw/rxe/rxe_req.c  | 55 +++++++++++++++++-----------
 drivers/infiniband/sw/rxe/rxe_resp.c |  2 +-
 5 files changed, 63 insertions(+), 35 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c
index 38c7b6fb39d7..360a567159fe 100644
--- a/drivers/infiniband/sw/rxe/rxe_av.c
+++ b/drivers/infiniband/sw/rxe/rxe_av.c
@@ -99,11 +99,14 @@ void rxe_av_fill_ip_info(struct rxe_av *av, struct rdma_ah_attr *attr)
 	av->network_type = type;
 }
 
-struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt)
+struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt, struct rxe_ah **ahp)
 {
 	struct rxe_ah *ah;
 	u32 ah_num;
 
+	if (ahp)
+		*ahp = NULL;
+
 	if (!pkt || !pkt->qp)
 		return NULL;
 
@@ -117,10 +120,22 @@ struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt)
 	if (ah_num) {
 		/* only new user provider or kernel client */
 		ah = rxe_pool_get_index(&pkt->rxe->ah_pool, ah_num);
-		if (!ah || ah->ah_num != ah_num || rxe_ah_pd(ah) != pkt->qp->pd) {
+		if (!ah) {
 			pr_warn("Unable to find AH matching ah_num\n");
 			return NULL;
 		}
+
+		if (rxe_ah_pd(ah) != pkt->qp->pd) {
+			pr_warn("PDs don't match for AH and QP\n");
+			rxe_drop_ref(ah);
+			return NULL;
+		}
+
+		if (ahp)
+			*ahp = ah;
+		else
+			rxe_drop_ref(ah);
+
 		return &ah->av;
 	}
 
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index b33a472eb347..1317a9c76f31 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -19,7 +19,7 @@ void rxe_av_to_attr(struct rxe_av *av, struct rdma_ah_attr *attr);
 
 void rxe_av_fill_ip_info(struct rxe_av *av, struct rdma_ah_attr *attr);
 
-struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt);
+struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt, struct rxe_ah **ahp);
 
 /* rxe_cq.c */
 int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq,
@@ -98,7 +98,8 @@ void rxe_mw_cleanup(struct rxe_pool_elem *arg);
 /* rxe_net.c */
 struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
 				int paylen, struct rxe_pkt_info *pkt);
-int rxe_prepare(struct rxe_pkt_info *pkt, struct sk_buff *skb);
+int rxe_prepare(struct rxe_av *av, struct rxe_pkt_info *pkt,
+		struct sk_buff *skb);
 int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 		    struct sk_buff *skb);
 const char *rxe_parent_name(struct rxe_dev *rxe, unsigned int port_num);
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 2cb810cb890a..456e960cacd7 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -293,13 +293,13 @@ static void prepare_ipv6_hdr(struct dst_entry *dst, struct sk_buff *skb,
 	ip6h->payload_len = htons(skb->len - sizeof(*ip6h));
 }
 
-static int prepare4(struct rxe_pkt_info *pkt, struct sk_buff *skb)
+static int prepare4(struct rxe_av *av, struct rxe_pkt_info *pkt,
+		    struct sk_buff *skb)
 {
 	struct rxe_qp *qp = pkt->qp;
 	struct dst_entry *dst;
 	bool xnet = false;
 	__be16 df = htons(IP_DF);
-	struct rxe_av *av = rxe_get_av(pkt);
 	struct in_addr *saddr = &av->sgid_addr._sockaddr_in.sin_addr;
 	struct in_addr *daddr = &av->dgid_addr._sockaddr_in.sin_addr;
 
@@ -319,11 +319,11 @@ static int prepare4(struct rxe_pkt_info *pkt, struct sk_buff *skb)
 	return 0;
 }
 
-static int prepare6(struct rxe_pkt_info *pkt, struct sk_buff *skb)
+static int prepare6(struct rxe_av *av, struct rxe_pkt_info *pkt,
+		    struct sk_buff *skb)
 {
 	struct rxe_qp *qp = pkt->qp;
 	struct dst_entry *dst;
-	struct rxe_av *av = rxe_get_av(pkt);
 	struct in6_addr *saddr = &av->sgid_addr._sockaddr_in6.sin6_addr;
 	struct in6_addr *daddr = &av->dgid_addr._sockaddr_in6.sin6_addr;
 
@@ -344,16 +344,17 @@ static int prepare6(struct rxe_pkt_info *pkt, struct sk_buff *skb)
 	return 0;
 }
 
-int rxe_prepare(struct rxe_pkt_info *pkt, struct sk_buff *skb)
+int rxe_prepare(struct rxe_av *av, struct rxe_pkt_info *pkt,
+		struct sk_buff *skb)
 {
 	int err = 0;
 
 	if (skb->protocol == htons(ETH_P_IP))
-		err = prepare4(pkt, skb);
+		err = prepare4(av, pkt, skb);
 	else if (skb->protocol == htons(ETH_P_IPV6))
-		err = prepare6(pkt, skb);
+		err = prepare6(av, pkt, skb);
 
-	if (ether_addr_equal(skb->dev->dev_addr, rxe_get_av(pkt)->dmac))
+	if (ether_addr_equal(skb->dev->dev_addr, av->dmac))
 		pkt->mask |= RXE_LOOPBACK_MASK;
 
 	return err;
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index c8d674da5cc2..7bc1ec8a5aa6 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -358,6 +358,7 @@ static inline int get_mtu(struct rxe_qp *qp)
 }
 
 static struct sk_buff *init_req_packet(struct rxe_qp *qp,
+				       struct rxe_av *av,
 				       struct rxe_send_wqe *wqe,
 				       int opcode, int payload,
 				       struct rxe_pkt_info *pkt)
@@ -365,7 +366,6 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
 	struct rxe_dev		*rxe = to_rdev(qp->ibqp.device);
 	struct sk_buff		*skb;
 	struct rxe_send_wr	*ibwr = &wqe->wr;
-	struct rxe_av		*av;
 	int			pad = (-payload) & 0x3;
 	int			paylen;
 	int			solicited;
@@ -375,21 +375,9 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
 
 	/* length from start of bth to end of icrc */
 	paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE;
-
-	/* pkt->hdr, port_num and mask are initialized in ifc layer */
-	pkt->rxe	= rxe;
-	pkt->opcode	= opcode;
-	pkt->qp		= qp;
-	pkt->psn	= qp->req.psn;
-	pkt->mask	= rxe_opcode[opcode].mask;
-	pkt->paylen	= paylen;
-	pkt->wqe	= wqe;
+	pkt->paylen = paylen;
 
 	/* init skb */
-	av = rxe_get_av(pkt);
-	if (!av)
-		return NULL;
-
 	skb = rxe_init_packet(rxe, av, paylen, pkt);
 	if (unlikely(!skb))
 		return NULL;
@@ -450,13 +438,13 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp,
 	return skb;
 }
 
-static int finish_packet(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
-		       struct rxe_pkt_info *pkt, struct sk_buff *skb,
-		       int paylen)
+static int finish_packet(struct rxe_qp *qp, struct rxe_av *av,
+			 struct rxe_send_wqe *wqe, struct rxe_pkt_info *pkt,
+			 struct sk_buff *skb, int paylen)
 {
 	int err;
 
-	err = rxe_prepare(pkt, skb);
+	err = rxe_prepare(av, pkt, skb);
 	if (err)
 		return err;
 
@@ -611,6 +599,7 @@ static int rxe_do_local_ops(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 int rxe_requester(void *arg)
 {
 	struct rxe_qp *qp = (struct rxe_qp *)arg;
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
 	struct rxe_pkt_info pkt;
 	struct sk_buff *skb;
 	struct rxe_send_wqe *wqe;
@@ -622,6 +611,8 @@ int rxe_requester(void *arg)
 	struct rxe_send_wqe rollback_wqe;
 	u32 rollback_psn;
 	struct rxe_queue *q = qp->sq.queue;
+	struct rxe_ah *ah;
+	struct rxe_av *av;
 
 	rxe_add_ref(qp);
 
@@ -708,14 +699,28 @@ int rxe_requester(void *arg)
 		payload = mtu;
 	}
 
-	skb = init_req_packet(qp, wqe, opcode, payload, &pkt);
+	pkt.rxe = rxe;
+	pkt.opcode = opcode;
+	pkt.qp = qp;
+	pkt.psn = qp->req.psn;
+	pkt.mask = rxe_opcode[opcode].mask;
+	pkt.wqe = wqe;
+
+	av = rxe_get_av(&pkt, &ah);
+	if (unlikely(!av)) {
+		pr_err("qp#%d Failed no address vector\n", qp_num(qp));
+		wqe->status = IB_WC_LOC_QP_OP_ERR;
+		goto err_drop_ah;
+	}
+
+	skb = init_req_packet(qp, av, wqe, opcode, payload, &pkt);
 	if (unlikely(!skb)) {
 		pr_err("qp#%d Failed allocating skb\n", qp_num(qp));
 		wqe->status = IB_WC_LOC_QP_OP_ERR;
-		goto err;
+		goto err_drop_ah;
 	}
 
-	ret = finish_packet(qp, wqe, &pkt, skb, payload);
+	ret = finish_packet(qp, av, wqe, &pkt, skb, payload);
 	if (unlikely(ret)) {
 		pr_debug("qp#%d Error during finish packet\n", qp_num(qp));
 		if (ret == -EFAULT)
@@ -723,9 +728,12 @@ int rxe_requester(void *arg)
 		else
 			wqe->status = IB_WC_LOC_QP_OP_ERR;
 		kfree_skb(skb);
-		goto err;
+		goto err_drop_ah;
 	}
 
+	if (ah)
+		rxe_drop_ref(ah);
+
 	/*
 	 * To prevent a race on wqe access between requester and completer,
 	 * wqe members state and psn need to be set before calling
@@ -754,6 +762,9 @@ int rxe_requester(void *arg)
 
 	goto next_wqe;
 
+err_drop_ah:
+	if (ah)
+		rxe_drop_ref(ah);
 err:
 	wqe->state = wqe_state_error;
 	__rxe_do_task(&qp->comp.task);
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index e8f435fa6e4d..f589f4dde35c 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -632,7 +632,7 @@ static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp,
 	if (ack->mask & RXE_ATMACK_MASK)
 		atmack_set_orig(ack, qp->resp.atomic_orig);
 
-	err = rxe_prepare(ack, skb);
+	err = rxe_prepare(&qp->pri_av, ack, skb);
 	if (err) {
 		kfree_skb(skb);
 		return NULL;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v4 12/13] RDMA/rxe: Replace mr by rkey in responder resources
  2021-11-03  5:02 [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Bob Pearson
                   ` (10 preceding siblings ...)
  2021-11-03  5:02 ` [PATCH for-next v4 11/13] RDMA/rxe: Fix ref error in rxe_av.c Bob Pearson
@ 2021-11-03  5:02 ` Bob Pearson
  2021-11-03  5:02 ` [PATCH for-next v4 13/13] RDMA/rxe: Protect against race between get_index and drop_ref Bob Pearson
  2021-11-19 17:59 ` [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Jason Gunthorpe
  13 siblings, 0 replies; 23+ messages in thread
From: Bob Pearson @ 2021-11-03  5:02 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Currently rxe saves a copy of MR in responder resources for RDMA reads.
Since the responder resources are never freed just over written if
more are needed this MR may not have a reference freed until the QP
is destroyed. This patch uses the rkey instead of the MR and on
subsequent packets of a multipacket read reply message it looks up the
MR from the rkey for each packet. This makes it possible for a user
to deregister an MR or unbind a MW on the fly and get correct behaviour.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c    |  10 +--
 drivers/infiniband/sw/rxe/rxe_resp.c  | 123 ++++++++++++++++++--------
 drivers/infiniband/sw/rxe/rxe_verbs.h |   1 -
 3 files changed, 87 insertions(+), 47 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 864bb3ef145f..4922a26bb5fc 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -135,12 +135,8 @@ static void free_rd_atomic_resources(struct rxe_qp *qp)
 
 void free_rd_atomic_resource(struct rxe_qp *qp, struct resp_res *res)
 {
-	if (res->type == RXE_ATOMIC_MASK) {
+	if (res->type == RXE_ATOMIC_MASK)
 		kfree_skb(res->atomic.skb);
-	} else if (res->type == RXE_READ_MASK) {
-		if (res->read.mr)
-			rxe_drop_ref(res->read.mr);
-	}
 	res->type = 0;
 }
 
@@ -816,10 +812,8 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
 	if (qp->pd)
 		rxe_drop_ref(qp->pd);
 
-	if (qp->resp.mr) {
+	if (qp->resp.mr)
 		rxe_drop_ref(qp->resp.mr);
-		qp->resp.mr = NULL;
-	}
 
 	if (qp_type(qp) == IB_QPT_RC)
 		sk_dst_reset(qp->sk->sk);
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index f589f4dde35c..c776289842e5 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -641,6 +641,78 @@ static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp,
 	return skb;
 }
 
+static struct resp_res *rxe_prepare_read_res(struct rxe_qp *qp,
+					struct rxe_pkt_info *pkt)
+{
+	struct resp_res *res;
+	u32 pkts;
+
+	res = &qp->resp.resources[qp->resp.res_head];
+	rxe_advance_resp_resource(qp);
+	free_rd_atomic_resource(qp, res);
+
+	res->type = RXE_READ_MASK;
+	res->replay = 0;
+	res->read.va = qp->resp.va + qp->resp.offset;
+	res->read.va_org = qp->resp.va + qp->resp.offset;
+	res->read.resid = qp->resp.resid;
+	res->read.length = qp->resp.resid;
+	res->read.rkey = qp->resp.rkey;
+
+	pkts = max_t(u32, (reth_len(pkt) + qp->mtu - 1)/qp->mtu, 1);
+	res->first_psn = pkt->psn;
+	res->cur_psn = pkt->psn;
+	res->last_psn = (pkt->psn + pkts - 1) & BTH_PSN_MASK;
+
+	res->state = rdatm_res_state_new;
+
+	return res;
+}
+
+/**
+ * rxe_recheck_mr - revalidate MR from rkey and get a reference
+ * @qp: the qp
+ * @rkey: the rkey
+ *
+ * This code allows the MR to be invalidated or deregistered or
+ * the MW if one was used to be invalidated or deallocated.
+ * It is assumed that the access permissions if originally good
+ * are OK and the mappings to be unchanged.
+ *
+ * Return: mr on success else NULL
+ */
+static struct rxe_mr *rxe_recheck_mr(struct rxe_qp *qp, u32 rkey)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+	struct rxe_mr *mr;
+	struct rxe_mw *mw;
+
+	if (rkey_is_mw(rkey)) {
+		mw = rxe_pool_get_index(&rxe->mw_pool, rkey >> 8);
+		if (!mw || mw->rkey != rkey)
+			return NULL;
+
+		if (mw->state != RXE_MW_STATE_VALID) {
+			rxe_drop_ref(mw);
+			return NULL;
+		}
+
+		mr = mw->mr;
+		rxe_drop_ref(mw);
+	} else {
+		mr = rxe_pool_get_index(&rxe->mr_pool, rkey >> 8);
+		if (!mr || mr->rkey != rkey)
+			return NULL;
+	}
+
+	if (mr->state != RXE_MR_STATE_VALID) {
+		rxe_drop_ref(mr);
+		return NULL;
+	}
+
+	return mr;
+}
+
 /* RDMA read response. If res is not NULL, then we have a current RDMA request
  * being processed or replayed.
  */
@@ -655,53 +727,26 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 	int opcode;
 	int err;
 	struct resp_res *res = qp->resp.res;
+	struct rxe_mr *mr;
 
 	if (!res) {
-		/* This is the first time we process that request. Get a
-		 * resource
-		 */
-		res = &qp->resp.resources[qp->resp.res_head];
-
-		free_rd_atomic_resource(qp, res);
-		rxe_advance_resp_resource(qp);
-
-		res->type		= RXE_READ_MASK;
-		res->replay		= 0;
-
-		res->read.va		= qp->resp.va +
-					  qp->resp.offset;
-		res->read.va_org	= qp->resp.va +
-					  qp->resp.offset;
-
-		res->first_psn		= req_pkt->psn;
-
-		if (reth_len(req_pkt)) {
-			res->last_psn	= (req_pkt->psn +
-					   (reth_len(req_pkt) + mtu - 1) /
-					   mtu - 1) & BTH_PSN_MASK;
-		} else {
-			res->last_psn	= res->first_psn;
-		}
-		res->cur_psn		= req_pkt->psn;
-
-		res->read.resid		= qp->resp.resid;
-		res->read.length	= qp->resp.resid;
-		res->read.rkey		= qp->resp.rkey;
-
-		/* note res inherits the reference to mr from qp */
-		res->read.mr		= qp->resp.mr;
-		qp->resp.mr		= NULL;
-
-		qp->resp.res		= res;
-		res->state		= rdatm_res_state_new;
+		res = rxe_prepare_read_res(qp, req_pkt);
+		qp->resp.res = res;
 	}
 
 	if (res->state == rdatm_res_state_new) {
+		mr = qp->resp.mr;
+		qp->resp.mr = NULL;
+
 		if (res->read.resid <= mtu)
 			opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY;
 		else
 			opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST;
 	} else {
+		mr = rxe_recheck_mr(qp, res->read.rkey);
+		if (!mr)
+			return RESPST_ERR_RKEY_VIOLATION;
+
 		if (res->read.resid > mtu)
 			opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE;
 		else
@@ -717,10 +762,12 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 	if (!skb)
 		return RESPST_ERR_RNR;
 
-	err = rxe_mr_copy(res->read.mr, res->read.va, payload_addr(&ack_pkt),
+	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
 			  payload, RXE_FROM_MR_OBJ);
 	if (err)
 		pr_err("Failed copying memory\n");
+	if (mr)
+		rxe_drop_ref(mr);
 
 	if (bth_pad(&ack_pkt)) {
 		u8 *pad = payload_addr(&ack_pkt) + payload;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index caf1ce118765..022abba4fb6b 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -157,7 +157,6 @@ struct resp_res {
 			struct sk_buff	*skb;
 		} atomic;
 		struct {
-			struct rxe_mr	*mr;
 			u64		va_org;
 			u32		rkey;
 			u32		length;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v4 13/13] RDMA/rxe: Protect against race between get_index and drop_ref
  2021-11-03  5:02 [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Bob Pearson
                   ` (11 preceding siblings ...)
  2021-11-03  5:02 ` [PATCH for-next v4 12/13] RDMA/rxe: Replace mr by rkey in responder resources Bob Pearson
@ 2021-11-03  5:02 ` Bob Pearson
  2021-11-19 17:46   ` Jason Gunthorpe
  2021-11-19 17:59 ` [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Jason Gunthorpe
  13 siblings, 1 reply; 23+ messages in thread
From: Bob Pearson @ 2021-11-03  5:02 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma; +Cc: Bob Pearson

Use refcount_inc_not_zero instead of kref_get to protect object
pointer returned by rxe_pool_get_index() to prevent chance of a
race between get_index and drop_ref by another thread.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 863fa62da077..688944fa3926 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -272,8 +272,13 @@ void *rxe_pool_get_index(struct rxe_pool *pool, unsigned long index)
 	}
 
 	elem = xa_load(&pool->xarray.xa, index);
+
 	if (elem) {
-		kref_get(&elem->ref_cnt);
+		/* protect against a race with someone else dropping
+		 * the last reference to the object
+		 */
+		if (!__rxe_add_ref(elem))
+			return NULL;
 		obj = elem->obj;
 	} else {
 		obj = NULL;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v4 05/13] RDMA/rxe: Replace RB tree by xarray for indexes
  2021-11-03  5:02 ` [PATCH for-next v4 05/13] RDMA/rxe: Replace RB tree by xarray for indexes Bob Pearson
@ 2021-11-19 17:38   ` Jason Gunthorpe
  0 siblings, 0 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2021-11-19 17:38 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Wed, Nov 03, 2021 at 12:02:34AM -0500, Bob Pearson wrote:
> @@ -342,8 +229,18 @@ void *rxe_alloc_locked(struct rxe_pool *pool)
>  	elem->obj = obj;
>  	kref_init(&elem->ref_cnt);
>  
> +	if (pool->flags & RXE_POOL_INDEX) {
> +		err = xa_alloc_cyclic_bh(&pool->xarray.xa, &elem->index, elem,
> +					 pool->xarray.limit,
> +					 &pool->xarray.next, GFP_ATOMIC);

This uses the _bh lock

>  void rxe_elem_release(struct kref *kref)
> @@ -397,6 +315,9 @@ void rxe_elem_release(struct kref *kref)
>  	struct rxe_pool *pool = elem->pool;
>  	void *obj;
>  
> +	if (pool->flags & RXE_POOL_INDEX)
> +		xa_erase(&pool->xarray.xa, elem->index);

But this doesn't ?

Shouldn't they all be the same?

And why is it a bh lock anyhow? Can you add some comments saying which
APIs here are called from the softirq?

> +void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
>  {
>  	struct rxe_pool_elem *elem;
>  	void *obj;
>  
> +	elem = xa_load(&pool->xarray.xa, index);
> +	if (elem) {
>  		kref_get(&elem->ref_cnt);
>  		obj = elem->obj;
>  	} else {

And why doesn't this use after free elem? This pattern is only safe
when using RCU or when holding the spinlock across the
kref_get.. Since you can't use rcu with the core allocated objects it
seems this needs a xa_lock?

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v4 09/13] RDMA/rxe: Replaced keyed rxe objects by indexed objects
  2021-11-03  5:02 ` [PATCH for-next v4 09/13] RDMA/rxe: Replaced keyed rxe objects by indexed objects Bob Pearson
@ 2021-11-19 17:41   ` Jason Gunthorpe
  2021-11-30 19:35     ` Bob Pearson
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2021-11-19 17:41 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Wed, Nov 03, 2021 at 12:02:38AM -0500, Bob Pearson wrote:
> +	if (mgid->raw[10] == 0xff && mgid->raw[11] == 0xff) {
> +		if ((mgid->raw[12] & 0xf0) != 0xe0)
> +			pr_info("mgid is not an ipv4 mc address\n");
> +
> +		/* mgid is a mapped IPV4 multicast address
> +		 * use the 32 bits as an index which will be
> +		 * unique
> +		 */
> +		index = be32_to_cpu(val32[3]);
> +	} else {
> +		if (mgid->raw[0] != 0xff)
> +			pr_info("mgid is not an ipv6 mc address\n");
> +
> +		/* mgid is an IPV6 multicast address which won't
> +		 * fit into the index so construct the index
> +		 * from the four 32 bit words in mgid.
> +		 * If there is a collision treat it like
> +		 * no memory and return NULL
> +		 */
> +		index = be32_to_cpu(val32[0] ^ val32[1]);
> +		index = (index << 32) | be32_to_cpu(val32[2] ^ val32[3]);

I'm not sure failing is such a good thing, can it chain them on a
linked list or something?

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v4 10/13] RDMA/rxe: Prevent taking references to dead objects
  2021-11-03  5:02 ` [PATCH for-next v4 10/13] RDMA/rxe: Prevent taking references to dead objects Bob Pearson
@ 2021-11-19 17:45   ` Jason Gunthorpe
  2021-11-30 20:05     ` Bob Pearson
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2021-11-19 17:45 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Wed, Nov 03, 2021 at 12:02:39AM -0500, Bob Pearson wrote:
> Currently rxe_add_ref() calls kref_get() which increments the
> reference count even if the object has already been released.
> Change this to refcount_inc_not_zero() which will only succeed
> if the current ref count is larger than zero. This change exposes
> some reference counting errors which will be fixed in the following
> patches.
> 
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>  drivers/infiniband/sw/rxe/rxe_pool.h | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
> index 6cd2366d5407..46f2abc359f3 100644
> +++ b/drivers/infiniband/sw/rxe/rxe_pool.h
> @@ -7,6 +7,8 @@
>  #ifndef RXE_POOL_H
>  #define RXE_POOL_H
>  
> +#include <linux/refcount.h>
> +
>  enum rxe_pool_flags {
>  	RXE_POOL_AUTO_INDEX	= BIT(1),
>  	RXE_POOL_EXT_INDEX	= BIT(2),
> @@ -70,9 +72,15 @@ int __rxe_pool_add(struct rxe_pool *pool, struct rxe_pool_elem *elem);
>  
>  void *rxe_pool_get_index(struct rxe_pool *pool, unsigned long index);
>  
> -#define rxe_add_ref(obj) kref_get(&(obj)->elem.ref_cnt)
> +static inline int __rxe_add_ref(struct rxe_pool_elem *elem)
> +{
> +	return refcount_inc_not_zero(&elem->ref_cnt.refcount);
> +}

Don't reach inside a kref to touch the
refcount. kref_get_unless_zero() is the api for krefs

I'm not sure how any of this works, ie rxe_create_qp() stuffs a core
allocated object into the pool, and various places do refcounting on
it

But then rxe_destroy_qp doesn't seem to do anything with the
refcounts, it just blindly lets things go to kfree in the core.

Seems really confused..

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v4 13/13] RDMA/rxe: Protect against race between get_index and drop_ref
  2021-11-03  5:02 ` [PATCH for-next v4 13/13] RDMA/rxe: Protect against race between get_index and drop_ref Bob Pearson
@ 2021-11-19 17:46   ` Jason Gunthorpe
  0 siblings, 0 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2021-11-19 17:46 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Wed, Nov 03, 2021 at 12:02:42AM -0500, Bob Pearson wrote:
> Use refcount_inc_not_zero instead of kref_get to protect object
> pointer returned by rxe_pool_get_index() to prevent chance of a
> race between get_index and drop_ref by another thread.
> 
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>  drivers/infiniband/sw/rxe/rxe_pool.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> index 863fa62da077..688944fa3926 100644
> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> @@ -272,8 +272,13 @@ void *rxe_pool_get_index(struct rxe_pool *pool, unsigned long index)
>  	}
>  
>  	elem = xa_load(&pool->xarray.xa, index);
> +
>  	if (elem) {
> -		kref_get(&elem->ref_cnt);
> +		/* protect against a race with someone else dropping
> +		 * the last reference to the object
> +		 */
> +		if (!__rxe_add_ref(elem))
> +			return NULL;
>  		obj = elem->obj;

That doesn't really work without RCU, since now you just use after
free on the ref_cnt atomic.

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe
  2021-11-03  5:02 [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Bob Pearson
                   ` (12 preceding siblings ...)
  2021-11-03  5:02 ` [PATCH for-next v4 13/13] RDMA/rxe: Protect against race between get_index and drop_ref Bob Pearson
@ 2021-11-19 17:59 ` Jason Gunthorpe
  13 siblings, 0 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2021-11-19 17:59 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Wed, Nov 03, 2021 at 12:02:29AM -0500, Bob Pearson wrote:
> There are several race conditions discovered in the current rdma_rxe
> driver.  They mostly relate to races between normal operations and
> destroying objects.  This patch series makes several minor cleanups in
> rxe_pool.[ch] and replaces the red-black trees currently used by xarrays
> which have better atomic behavior.
> 
> This patch series applies cleanly to current for-next.
> commit 6a463bc9d999 ("Merge branch 'for-rc' into rdma.git for-next")
> 
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> v4
>   Restructured patch series to change to xarray earlier which
>   greatly simplified the changes.
>   Rebased to current for-next
> v3
>   Changed rxe_alloc to use GFP_KERNEL
>   Addressed other comments by Jason Gunthorp
>   Merged the previous 06/10 and 07/10 patches into one since they overlapped
>   Added some minor cleanups as 10/10
> v2
>   Rebased to current for-next.
>   Added 4 additional patches
> 
> Bob Pearson (13):
>   RDMA/rxe: Replace irqsave locks with bh locks
>   RDMA/rxe: Cleanup rxe_pool_entry
>   RDMA/rxe: Copy setup parameters into rxe_pool
>   RDMA/rxe: Save object pointer in pool element
>   RDMA/rxe: Remove #include "rxe_loc.h" from rxe_pool.c

I took these patches

>   RDMA/rxe: Replace RB tree by xarray for indexes
>   RDMA/rxe: Remove some #defines from rxe_pool.h
>   RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
>   RDMA/rxe: Replaced keyed rxe objects by indexed objects
>   RDMA/rxe: Prevent taking references to dead objects
>   RDMA/rxe: Fix ref error in rxe_av.c
>   RDMA/rxe: Replace mr by rkey in responder resources
>   RDMA/rxe: Protect against race between get_index and drop_ref

These need some fixing

Thanks,
Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v4 09/13] RDMA/rxe: Replaced keyed rxe objects by indexed objects
  2021-11-19 17:41   ` Jason Gunthorpe
@ 2021-11-30 19:35     ` Bob Pearson
  2021-12-02  7:14       ` Christoph Lameter
  0 siblings, 1 reply; 23+ messages in thread
From: Bob Pearson @ 2021-11-30 19:35 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 11/19/21 11:41, Jason Gunthorpe wrote:
> On Wed, Nov 03, 2021 at 12:02:38AM -0500, Bob Pearson wrote:
>> +	if (mgid->raw[10] == 0xff && mgid->raw[11] == 0xff) {
>> +		if ((mgid->raw[12] & 0xf0) != 0xe0)
>> +			pr_info("mgid is not an ipv4 mc address\n");
>> +
>> +		/* mgid is a mapped IPV4 multicast address
>> +		 * use the 32 bits as an index which will be
>> +		 * unique
>> +		 */
>> +		index = be32_to_cpu(val32[3]);
>> +	} else {
>> +		if (mgid->raw[0] != 0xff)
>> +			pr_info("mgid is not an ipv6 mc address\n");
>> +
>> +		/* mgid is an IPV6 multicast address which won't
>> +		 * fit into the index so construct the index
>> +		 * from the four 32 bit words in mgid.
>> +		 * If there is a collision treat it like
>> +		 * no memory and return NULL
>> +		 */
>> +		index = be32_to_cpu(val32[0] ^ val32[1]);
>> +		index = (index << 32) | be32_to_cpu(val32[2] ^ val32[3]);
> 
> I'm not sure failing is such a good thing, can it chain them on a
> linked list or something?
> 
> Jason
> 

I had developed a version of the proposed patch which did what you suggest and felt that
it wasn't an improvement over the current code. Just too complicated. This patch is only
marginally cleaner than what is already there. I am leaning towards just leaving the
current red-black tree based code in place for mgids which don't fit nicely into xarrays
which however are quite a bit better for 32 bit indices (qpns, qkeys, rkeys, etc.)
Multicast is not an important use case for RoCE since plain IP is a better solution.
Unless you think otherwise we should just drop this patch.

Bob

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v4 10/13] RDMA/rxe: Prevent taking references to dead objects
  2021-11-19 17:45   ` Jason Gunthorpe
@ 2021-11-30 20:05     ` Bob Pearson
  2021-12-01 13:55       ` Jason Gunthorpe
  0 siblings, 1 reply; 23+ messages in thread
From: Bob Pearson @ 2021-11-30 20:05 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: zyjzyj2000, linux-rdma

On 11/19/21 11:45, Jason Gunthorpe wrote:
> On Wed, Nov 03, 2021 at 12:02:39AM -0500, Bob Pearson wrote:
>> Currently rxe_add_ref() calls kref_get() which increments the
>> reference count even if the object has already been released.
>> Change this to refcount_inc_not_zero() which will only succeed
>> if the current ref count is larger than zero. This change exposes
>> some reference counting errors which will be fixed in the following
>> patches.
>>
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>  drivers/infiniband/sw/rxe/rxe_pool.h | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
>> index 6cd2366d5407..46f2abc359f3 100644
>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.h
>> @@ -7,6 +7,8 @@
>>  #ifndef RXE_POOL_H
>>  #define RXE_POOL_H
>>  
>> +#include <linux/refcount.h>
>> +
>>  enum rxe_pool_flags {
>>  	RXE_POOL_AUTO_INDEX	= BIT(1),
>>  	RXE_POOL_EXT_INDEX	= BIT(2),
>> @@ -70,9 +72,15 @@ int __rxe_pool_add(struct rxe_pool *pool, struct rxe_pool_elem *elem);
>>  
>>  void *rxe_pool_get_index(struct rxe_pool *pool, unsigned long index);
>>  
>> -#define rxe_add_ref(obj) kref_get(&(obj)->elem.ref_cnt)
>> +static inline int __rxe_add_ref(struct rxe_pool_elem *elem)
>> +{
>> +	return refcount_inc_not_zero(&elem->ref_cnt.refcount);
>> +}
> 
> Don't reach inside a kref to touch the
> refcount. kref_get_unless_zero() is the api for krefs

Yes. Thanks for pointing this out. I had started using the refcount APIs instead of the
kref APIs for a different one that is not supported by kref and then quit using it.
This is a cleaner solution.
> 
> I'm not sure how any of this works, ie rxe_create_qp() stuffs a core
> allocated object into the pool, and various places do refcounting on
> it
I have been having a long running discussion with Leon about rxe's use of ref counting.
His opinion is that it isn't required at all and that rdma-core is doing enough. I think
that we either have to push the problem onto the apps (e.g. return EAGAIN or EBUSY) or
let the driver sleep if there are currently active packets being processed with a mutex
or semaphore. Additionally we need to cleanup some bad code that makes things not work as
mentioned.
 
> 
> But then rxe_destroy_qp doesn't seem to do anything with the
> refcounts, it just blindly lets things go to kfree in the core.
> 
> Seems really confused..
> 
> Jason
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v4 10/13] RDMA/rxe: Prevent taking references to dead objects
  2021-11-30 20:05     ` Bob Pearson
@ 2021-12-01 13:55       ` Jason Gunthorpe
  0 siblings, 0 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2021-12-01 13:55 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma

On Tue, Nov 30, 2021 at 02:05:03PM -0600, Bob Pearson wrote:

> let the driver sleep if there are currently active packets being processed with a mutex
> or semaphore. Additionally we need to cleanup some bad code that makes things not work as
> mentioned.

Drivers have to sleep in their destroy to fully fence any activity
related to the destroying object.

It can't keep operating the object after destroy returns, that is not
our semantic

Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v4 09/13] RDMA/rxe: Replaced keyed rxe objects by indexed objects
  2021-11-30 19:35     ` Bob Pearson
@ 2021-12-02  7:14       ` Christoph Lameter
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2021-12-02  7:14 UTC (permalink / raw)
  To: Bob Pearson; +Cc: Jason Gunthorpe, zyjzyj2000, linux-rdma

Hi Bob, good to hear from you

On Tue, 30 Nov 2021, Bob Pearson wrote:

> Multicast is not an important use case for RoCE since plain IP is a better solution.

Multicast is an important use case for ROCE especially in the area of
finance and there is commercial middleware that depends on it. Plain IP
suffers from the high kernel overhead. High packet rates are only
possible through the RDMA apis.

Greetings,
   Christoph



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-12-02  7:14 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-03  5:02 [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Bob Pearson
2021-11-03  5:02 ` [PATCH for-next v4 01/13] RDMA/rxe: Replace irqsave locks with bh locks Bob Pearson
2021-11-03  5:02 ` [PATCH for-next v4 02/13] RDMA/rxe: Cleanup rxe_pool_entry Bob Pearson
2021-11-03  5:02 ` [PATCH for-next v4 03/13] RDMA/rxe: Copy setup parameters into rxe_pool Bob Pearson
2021-11-03  5:02 ` [PATCH for-next v4 04/13] RDMA/rxe: Save object pointer in pool element Bob Pearson
2021-11-03  5:02 ` [PATCH for-next v4 05/13] RDMA/rxe: Replace RB tree by xarray for indexes Bob Pearson
2021-11-19 17:38   ` Jason Gunthorpe
2021-11-03  5:02 ` [PATCH for-next v4 06/13] RDMA/rxe: Remove #include "rxe_loc.h" from rxe_pool.c Bob Pearson
2021-11-03  5:02 ` [PATCH v4 for-next 07/13] RDMA/rxe: Remove some #defines from rxe_pool.h Bob Pearson
2021-11-03  5:02 ` [PATCH v4 for-next 08/13] RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC Bob Pearson
2021-11-03  5:02 ` [PATCH for-next v4 09/13] RDMA/rxe: Replaced keyed rxe objects by indexed objects Bob Pearson
2021-11-19 17:41   ` Jason Gunthorpe
2021-11-30 19:35     ` Bob Pearson
2021-12-02  7:14       ` Christoph Lameter
2021-11-03  5:02 ` [PATCH for-next v4 10/13] RDMA/rxe: Prevent taking references to dead objects Bob Pearson
2021-11-19 17:45   ` Jason Gunthorpe
2021-11-30 20:05     ` Bob Pearson
2021-12-01 13:55       ` Jason Gunthorpe
2021-11-03  5:02 ` [PATCH for-next v4 11/13] RDMA/rxe: Fix ref error in rxe_av.c Bob Pearson
2021-11-03  5:02 ` [PATCH for-next v4 12/13] RDMA/rxe: Replace mr by rkey in responder resources Bob Pearson
2021-11-03  5:02 ` [PATCH for-next v4 13/13] RDMA/rxe: Protect against race between get_index and drop_ref Bob Pearson
2021-11-19 17:46   ` Jason Gunthorpe
2021-11-19 17:59 ` [PATCH for-next v4 00/13] Correct race conditions in rdma_rxe Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).