All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] Refactor rxe driver to remove multiple race conditions
@ 2019-07-22 15:14 Maksym Planeta
  2019-07-22 15:14 ` [PATCH 01/10] Simplify rxe_run_task interface Maksym Planeta
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
  To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma, linux-kernel
  Cc: Maksym Planeta

This patchset helps to get rid of following race condition situations:
                                             
  1. Tasklet functions were incrementing reference counting after entering
  running the tasklet.                       
  2. Getting a pointer to reference counted object (kref) was done without
  protecting kref_put with a lock.
  3. QP cleanup was sometimes scheduling cleanup for later execution in
  rxe_qp_do_cleaunpm, although this QP's memory could be freed immediately after
  returning from rxe_qp_cleanup.
  4. Non-atomic cleanup functions could be called in SoftIRQ context
  5. Manipulating with reference counter inside a critical section could have
  been done both inside and outside of SoftIRQ region. Such behavior may end up
  in a deadlock.

The easiest way to observe these problems is to compile the kernel with KASAN
and lockdep and abruptly stop an application using SoftRoCE during the
communication phase. For my system this often resulted in kernel crash of a
deadlock inside the kernel.

To fix the above mentioned problems, this patch does following things:

  1. Replace tasklets with workqueues
  2. Adds locks to kref_put
  3. Aquires reference counting in an appropriate place

As a shortcomming, the performance is slightly reduced, because instead of
trying to execute tasklet function directly the new version always puts it onto
the queue.

TBH, I'm not sure that I removed all of the problems, but the driver
deffinetely behaves much more stable now. I would be glad to get some
help with additional testing.

 drivers/infiniband/sw/rxe/rxe_comp.c        |  38 ++----
 drivers/infiniband/sw/rxe/rxe_cq.c          |  17 ++-
 drivers/infiniband/sw/rxe/rxe_hw_counters.c |   1 -
 drivers/infiniband/sw/rxe/rxe_hw_counters.h |   1 -
 drivers/infiniband/sw/rxe/rxe_loc.h         |   3 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c       |  22 ++--
 drivers/infiniband/sw/rxe/rxe_mr.c          |  10 +-
 drivers/infiniband/sw/rxe/rxe_net.c         |  21 ++-
 drivers/infiniband/sw/rxe/rxe_pool.c        |  40 ++++--
 drivers/infiniband/sw/rxe/rxe_pool.h        |  16 ++-
 drivers/infiniband/sw/rxe/rxe_qp.c          | 130 +++++++++---------
 drivers/infiniband/sw/rxe/rxe_recv.c        |   8 +-
 drivers/infiniband/sw/rxe/rxe_req.c         |  17 +--
 drivers/infiniband/sw/rxe/rxe_resp.c        |  54 ++++----
 drivers/infiniband/sw/rxe/rxe_task.c        | 139 +++++++-------------
 drivers/infiniband/sw/rxe/rxe_task.h        |  40 ++----
 drivers/infiniband/sw/rxe/rxe_verbs.c       |  81 ++++++------
 drivers/infiniband/sw/rxe/rxe_verbs.h       |   8 +-
 18 files changed, 302 insertions(+), 344 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 01/10] Simplify rxe_run_task interface
  2019-07-22 15:14 [PATCH 00/10] Refactor rxe driver to remove multiple race conditions Maksym Planeta
@ 2019-07-22 15:14 ` Maksym Planeta
  2019-07-22 15:14 ` [PATCH 02/10] Remove counter that does not have a meaning anymore Maksym Planeta
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
  To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma, linux-kernel
  Cc: Maksym Planeta

Make rxe_run_task only schedule tasks and never run them directly.

This simplification will be used in further patches.

Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
 drivers/infiniband/sw/rxe/rxe_comp.c  | 16 ++++++++--------
 drivers/infiniband/sw/rxe/rxe_loc.h   |  2 +-
 drivers/infiniband/sw/rxe/rxe_net.c   |  2 +-
 drivers/infiniband/sw/rxe/rxe_qp.c    | 10 +++++-----
 drivers/infiniband/sw/rxe/rxe_req.c   |  6 +++---
 drivers/infiniband/sw/rxe/rxe_resp.c  |  8 +-------
 drivers/infiniband/sw/rxe/rxe_task.c  |  7 ++-----
 drivers/infiniband/sw/rxe/rxe_task.h  |  6 +++---
 drivers/infiniband/sw/rxe/rxe_verbs.c |  8 ++++----
 9 files changed, 28 insertions(+), 37 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 116cafc9afcf..ad09bd9d0a82 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -142,7 +142,7 @@ void retransmit_timer(struct timer_list *t)
 
 	if (qp->valid) {
 		qp->comp.timeout = 1;
-		rxe_run_task(&qp->comp.task, 1);
+		rxe_run_task(&qp->comp.task);
 	}
 }
 
@@ -156,7 +156,7 @@ void rxe_comp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb)
 	if (must_sched != 0)
 		rxe_counter_inc(SKB_TO_PKT(skb)->rxe, RXE_CNT_COMPLETER_SCHED);
 
-	rxe_run_task(&qp->comp.task, must_sched);
+	rxe_run_task(&qp->comp.task);
 }
 
 static inline enum comp_state get_wqe(struct rxe_qp *qp,
@@ -329,7 +329,7 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
 					qp->comp.psn = pkt->psn;
 					if (qp->req.wait_psn) {
 						qp->req.wait_psn = 0;
-						rxe_run_task(&qp->req.task, 1);
+						rxe_run_task(&qp->req.task);
 					}
 				}
 				return COMPST_ERROR_RETRY;
@@ -463,7 +463,7 @@ static void do_complete(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 	 */
 	if (qp->req.wait_fence) {
 		qp->req.wait_fence = 0;
-		rxe_run_task(&qp->req.task, 1);
+		rxe_run_task(&qp->req.task);
 	}
 }
 
@@ -479,7 +479,7 @@ static inline enum comp_state complete_ack(struct rxe_qp *qp,
 		if (qp->req.need_rd_atomic) {
 			qp->comp.timeout_retry = 0;
 			qp->req.need_rd_atomic = 0;
-			rxe_run_task(&qp->req.task, 1);
+			rxe_run_task(&qp->req.task);
 		}
 	}
 
@@ -525,7 +525,7 @@ static inline enum comp_state complete_wqe(struct rxe_qp *qp,
 
 		if (qp->req.wait_psn) {
 			qp->req.wait_psn = 0;
-			rxe_run_task(&qp->req.task, 1);
+			rxe_run_task(&qp->req.task);
 		}
 	}
 
@@ -644,7 +644,7 @@ int rxe_completer(void *arg)
 
 			if (qp->req.wait_psn) {
 				qp->req.wait_psn = 0;
-				rxe_run_task(&qp->req.task, 1);
+				rxe_run_task(&qp->req.task);
 			}
 
 			state = COMPST_DONE;
@@ -725,7 +725,7 @@ int rxe_completer(void *arg)
 							RXE_CNT_COMP_RETRY);
 					qp->req.need_retry = 1;
 					qp->comp.started_retry = 1;
-					rxe_run_task(&qp->req.task, 1);
+					rxe_run_task(&qp->req.task);
 				}
 
 				if (pkt) {
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 775c23becaec..b7159d9d5107 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -277,7 +277,7 @@ static inline int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 	if ((qp_type(qp) != IB_QPT_RC) &&
 	    (pkt->mask & RXE_END_MASK)) {
 		pkt->wqe->state = wqe_state_done;
-		rxe_run_task(&qp->comp.task, 1);
+		rxe_run_task(&qp->comp.task);
 	}
 
 	rxe_counter_inc(rxe, RXE_CNT_SENT_PKTS);
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 5a3474f9351b..e50f19fadcf9 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -414,7 +414,7 @@ static void rxe_skb_tx_dtor(struct sk_buff *skb)
 
 	if (unlikely(qp->need_req_skb &&
 		     skb_out < RXE_INFLIGHT_SKBS_PER_QP_LOW))
-		rxe_run_task(&qp->req.task, 1);
+		rxe_run_task(&qp->req.task);
 
 	rxe_drop_ref(qp);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index e2c6d1cedf41..623f44f1d1d5 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -560,10 +560,10 @@ static void rxe_qp_drain(struct rxe_qp *qp)
 		if (qp->req.state != QP_STATE_DRAINED) {
 			qp->req.state = QP_STATE_DRAIN;
 			if (qp_type(qp) == IB_QPT_RC)
-				rxe_run_task(&qp->comp.task, 1);
+				rxe_run_task(&qp->comp.task);
 			else
 				__rxe_do_task(&qp->comp.task);
-			rxe_run_task(&qp->req.task, 1);
+			rxe_run_task(&qp->req.task);
 		}
 	}
 }
@@ -576,13 +576,13 @@ void rxe_qp_error(struct rxe_qp *qp)
 	qp->attr.qp_state = IB_QPS_ERR;
 
 	/* drain work and packet queues */
-	rxe_run_task(&qp->resp.task, 1);
+	rxe_run_task(&qp->resp.task);
 
 	if (qp_type(qp) == IB_QPT_RC)
-		rxe_run_task(&qp->comp.task, 1);
+		rxe_run_task(&qp->comp.task);
 	else
 		__rxe_do_task(&qp->comp.task);
-	rxe_run_task(&qp->req.task, 1);
+	rxe_run_task(&qp->req.task);
 }
 
 /* called by the modify qp verb */
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index c5d9b558fa90..a84c0407545b 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -126,7 +126,7 @@ void rnr_nak_timer(struct timer_list *t)
 	struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
 
 	pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
-	rxe_run_task(&qp->req.task, 1);
+	rxe_run_task(&qp->req.task);
 }
 
 static struct rxe_send_wqe *req_next_wqe(struct rxe_qp *qp)
@@ -651,7 +651,7 @@ int rxe_requester(void *arg)
 		}
 		if ((wqe->wr.send_flags & IB_SEND_SIGNALED) ||
 		    qp->sq_sig_type == IB_SIGNAL_ALL_WR)
-			rxe_run_task(&qp->comp.task, 1);
+			rxe_run_task(&qp->comp.task);
 		qp->req.wqe_index = next_index(qp->sq.queue,
 						qp->req.wqe_index);
 		goto next_wqe;
@@ -736,7 +736,7 @@ int rxe_requester(void *arg)
 		rollback_state(wqe, qp, &rollback_wqe, rollback_psn);
 
 		if (ret == -EAGAIN) {
-			rxe_run_task(&qp->req.task, 1);
+			rxe_run_task(&qp->req.task);
 			goto exit;
 		}
 
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 1cbfbd98eb22..d4b5535b8517 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -106,15 +106,9 @@ static char *resp_state_name[] = {
 /* rxe_recv calls here to add a request packet to the input queue */
 void rxe_resp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb)
 {
-	int must_sched;
-	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
-
 	skb_queue_tail(&qp->req_pkts, skb);
 
-	must_sched = (pkt->opcode == IB_OPCODE_RC_RDMA_READ_REQUEST) ||
-			(skb_queue_len(&qp->req_pkts) > 1);
-
-	rxe_run_task(&qp->resp.task, must_sched);
+	rxe_run_task(&qp->resp.task);
 }
 
 static inline enum resp_states get_req(struct rxe_qp *qp,
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 08f05ac5f5d5..7c2b6e4595f5 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -151,15 +151,12 @@ void rxe_cleanup_task(struct rxe_task *task)
 	tasklet_kill(&task->tasklet);
 }
 
-void rxe_run_task(struct rxe_task *task, int sched)
+void rxe_run_task(struct rxe_task *task)
 {
 	if (task->destroyed)
 		return;
 
-	if (sched)
-		tasklet_schedule(&task->tasklet);
-	else
-		rxe_do_task((unsigned long)task);
+	tasklet_schedule(&task->tasklet);
 }
 
 void rxe_disable_task(struct rxe_task *task)
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index 08ff42d451c6..5c1fc7d5b953 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -82,10 +82,10 @@ int __rxe_do_task(struct rxe_task *task);
  */
 void rxe_do_task(unsigned long data);
 
-/* run a task, else schedule it to run as a tasklet, The decision
- * to run or schedule tasklet is based on the parameter sched.
+/*
+ * schedule task to run as a tasklet.
  */
-void rxe_run_task(struct rxe_task *task, int sched);
+void rxe_run_task(struct rxe_task *task);
 
 /* keep a task from scheduling */
 void rxe_disable_task(struct rxe_task *task);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 4ebdfcf4d33e..f6b25b409d12 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -708,9 +708,9 @@ static int rxe_post_send_kernel(struct rxe_qp *qp, const struct ib_send_wr *wr,
 		wr = wr->next;
 	}
 
-	rxe_run_task(&qp->req.task, 1);
+	rxe_run_task(&qp->req.task);
 	if (unlikely(qp->req.state == QP_STATE_ERROR))
-		rxe_run_task(&qp->comp.task, 1);
+		rxe_run_task(&qp->comp.task);
 
 	return err;
 }
@@ -732,7 +732,7 @@ static int rxe_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr,
 
 	if (qp->is_user) {
 		/* Utilize process context to do protocol processing */
-		rxe_run_task(&qp->req.task, 0);
+		rxe_run_task(&qp->req.task);
 		return 0;
 	} else
 		return rxe_post_send_kernel(qp, wr, bad_wr);
@@ -772,7 +772,7 @@ static int rxe_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
 	spin_unlock_irqrestore(&rq->producer_lock, flags);
 
 	if (qp->resp.state == QP_STATE_ERROR)
-		rxe_run_task(&qp->resp.task, 1);
+		rxe_run_task(&qp->resp.task);
 
 err1:
 	return err;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 02/10] Remove counter that does not have a meaning anymore
  2019-07-22 15:14 [PATCH 00/10] Refactor rxe driver to remove multiple race conditions Maksym Planeta
  2019-07-22 15:14 ` [PATCH 01/10] Simplify rxe_run_task interface Maksym Planeta
@ 2019-07-22 15:14 ` Maksym Planeta
  2019-07-22 15:14 ` [PATCH 03/10] Make pool interface more type safe Maksym Planeta
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
  To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma, linux-kernel
  Cc: Maksym Planeta

After putting all tasks to schedule, this counter does not have a
meaning anymore.

Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
 drivers/infiniband/sw/rxe/rxe_comp.c        | 6 ------
 drivers/infiniband/sw/rxe/rxe_hw_counters.c | 1 -
 drivers/infiniband/sw/rxe/rxe_hw_counters.h | 1 -
 3 files changed, 8 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index ad09bd9d0a82..5f3a43cfeb63 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -148,14 +148,8 @@ void retransmit_timer(struct timer_list *t)
 
 void rxe_comp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb)
 {
-	int must_sched;
-
 	skb_queue_tail(&qp->resp_pkts, skb);
 
-	must_sched = skb_queue_len(&qp->resp_pkts) > 1;
-	if (must_sched != 0)
-		rxe_counter_inc(SKB_TO_PKT(skb)->rxe, RXE_CNT_COMPLETER_SCHED);
-
 	rxe_run_task(&qp->comp.task);
 }
 
diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
index 636edb5f4cf4..9bc762b0e66a 100644
--- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c
+++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
@@ -41,7 +41,6 @@ static const char * const rxe_counter_name[] = {
 	[RXE_CNT_RCV_RNR]             =  "rcvd_rnr_err",
 	[RXE_CNT_SND_RNR]             =  "send_rnr_err",
 	[RXE_CNT_RCV_SEQ_ERR]         =  "rcvd_seq_err",
-	[RXE_CNT_COMPLETER_SCHED]     =  "ack_deferred",
 	[RXE_CNT_RETRY_EXCEEDED]      =  "retry_exceeded_err",
 	[RXE_CNT_RNR_RETRY_EXCEEDED]  =  "retry_rnr_exceeded_err",
 	[RXE_CNT_COMP_RETRY]          =  "completer_retry_err",
diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
index 72c0d63c79e0..cfe19159a77f 100644
--- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h
+++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
@@ -45,7 +45,6 @@ enum rxe_counters {
 	RXE_CNT_RCV_RNR,
 	RXE_CNT_SND_RNR,
 	RXE_CNT_RCV_SEQ_ERR,
-	RXE_CNT_COMPLETER_SCHED,
 	RXE_CNT_RETRY_EXCEEDED,
 	RXE_CNT_RNR_RETRY_EXCEEDED,
 	RXE_CNT_COMP_RETRY,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 03/10] Make pool interface more type safe
  2019-07-22 15:14 [PATCH 00/10] Refactor rxe driver to remove multiple race conditions Maksym Planeta
  2019-07-22 15:14 ` [PATCH 01/10] Simplify rxe_run_task interface Maksym Planeta
  2019-07-22 15:14 ` [PATCH 02/10] Remove counter that does not have a meaning anymore Maksym Planeta
@ 2019-07-22 15:14 ` Maksym Planeta
  2019-07-22 15:14 ` [PATCH 04/10] Protect kref_put with the lock Maksym Planeta
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
  To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma, linux-kernel
  Cc: Maksym Planeta

Replace void* with rxe_pool_entry* for some functions.

Change macro to inline function.

Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
 drivers/infiniband/sw/rxe/rxe_comp.c  | 18 ++++----
 drivers/infiniband/sw/rxe/rxe_mcast.c | 20 ++++----
 drivers/infiniband/sw/rxe/rxe_mr.c    |  8 ++--
 drivers/infiniband/sw/rxe/rxe_net.c   |  6 +--
 drivers/infiniband/sw/rxe/rxe_pool.c  | 12 ++---
 drivers/infiniband/sw/rxe/rxe_pool.h  | 16 ++++---
 drivers/infiniband/sw/rxe/rxe_qp.c    | 32 ++++++-------
 drivers/infiniband/sw/rxe/rxe_recv.c  |  8 ++--
 drivers/infiniband/sw/rxe/rxe_req.c   |  8 ++--
 drivers/infiniband/sw/rxe/rxe_resp.c  | 22 ++++-----
 drivers/infiniband/sw/rxe/rxe_verbs.c | 66 +++++++++++++--------------
 11 files changed, 108 insertions(+), 108 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 5f3a43cfeb63..bdeb288673f3 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -534,7 +534,7 @@ static void rxe_drain_resp_pkts(struct rxe_qp *qp, bool notify)
 	struct rxe_send_wqe *wqe;
 
 	while ((skb = skb_dequeue(&qp->resp_pkts))) {
-		rxe_drop_ref(qp);
+		rxe_drop_ref(&qp->pelem);
 		kfree_skb(skb);
 	}
 
@@ -557,7 +557,7 @@ int rxe_completer(void *arg)
 	struct rxe_pkt_info *pkt = NULL;
 	enum comp_state state;
 
-	rxe_add_ref(qp);
+	rxe_add_ref(&qp->pelem);
 
 	if (!qp->valid || qp->req.state == QP_STATE_ERROR ||
 	    qp->req.state == QP_STATE_RESET) {
@@ -646,7 +646,7 @@ int rxe_completer(void *arg)
 
 		case COMPST_DONE:
 			if (pkt) {
-				rxe_drop_ref(pkt->qp);
+				rxe_drop_ref(&pkt->qp->pelem);
 				kfree_skb(skb);
 				skb = NULL;
 			}
@@ -694,7 +694,7 @@ int rxe_completer(void *arg)
 			if (qp->comp.started_retry &&
 			    !qp->comp.timeout_retry) {
 				if (pkt) {
-					rxe_drop_ref(pkt->qp);
+					rxe_drop_ref(&pkt->qp->pelem);
 					kfree_skb(skb);
 					skb = NULL;
 				}
@@ -723,7 +723,7 @@ int rxe_completer(void *arg)
 				}
 
 				if (pkt) {
-					rxe_drop_ref(pkt->qp);
+					rxe_drop_ref(&pkt->qp->pelem);
 					kfree_skb(skb);
 					skb = NULL;
 				}
@@ -748,7 +748,7 @@ int rxe_completer(void *arg)
 				mod_timer(&qp->rnr_nak_timer,
 					  jiffies + rnrnak_jiffies(aeth_syn(pkt)
 						& ~AETH_TYPE_MASK));
-				rxe_drop_ref(pkt->qp);
+				rxe_drop_ref(&pkt->qp->pelem);
 				kfree_skb(skb);
 				skb = NULL;
 				goto exit;
@@ -766,7 +766,7 @@ int rxe_completer(void *arg)
 			rxe_qp_error(qp);
 
 			if (pkt) {
-				rxe_drop_ref(pkt->qp);
+				rxe_drop_ref(&pkt->qp->pelem);
 				kfree_skb(skb);
 				skb = NULL;
 			}
@@ -780,7 +780,7 @@ int rxe_completer(void *arg)
 	 * exit from the loop calling us
 	 */
 	WARN_ON_ONCE(skb);
-	rxe_drop_ref(qp);
+	rxe_drop_ref(&qp->pelem);
 	return -EAGAIN;
 
 done:
@@ -788,6 +788,6 @@ int rxe_completer(void *arg)
 	 * us again to see if there is anything else to do
 	 */
 	WARN_ON_ONCE(skb);
-	rxe_drop_ref(qp);
+	rxe_drop_ref(&qp->pelem);
 	return 0;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 522a7942c56c..24ebc4ca1b99 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -59,7 +59,7 @@ int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 	spin_lock_init(&grp->mcg_lock);
 	grp->rxe = rxe;
 
-	rxe_add_key(grp, mgid);
+	rxe_add_key(&grp->pelem, mgid);
 
 	err = rxe_mcast_add(rxe, mgid);
 	if (err)
@@ -70,7 +70,7 @@ int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
 	return 0;
 
 err2:
-	rxe_drop_ref(grp);
+	rxe_drop_ref(&grp->pelem);
 err1:
 	return err;
 }
@@ -103,7 +103,7 @@ int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 	}
 
 	/* each qp holds a ref on the grp */
-	rxe_add_ref(grp);
+	rxe_add_ref(&grp->pelem);
 
 	grp->num_qp++;
 	elem->qp = qp;
@@ -140,16 +140,16 @@ int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 			spin_unlock_bh(&grp->mcg_lock);
 			spin_unlock_bh(&qp->grp_lock);
-			rxe_drop_ref(elem);
-			rxe_drop_ref(grp);	/* ref held by QP */
-			rxe_drop_ref(grp);	/* ref from get_key */
+			rxe_drop_ref(&elem->pelem);
+			rxe_drop_ref(&grp->pelem);	/* ref held by QP */
+			rxe_drop_ref(&grp->pelem);	/* ref from get_key */
 			return 0;
 		}
 	}
 
 	spin_unlock_bh(&grp->mcg_lock);
 	spin_unlock_bh(&qp->grp_lock);
-	rxe_drop_ref(grp);			/* ref from get_key */
+	rxe_drop_ref(&grp->pelem);			/* ref from get_key */
 err1:
 	return -EINVAL;
 }
@@ -175,8 +175,8 @@ void rxe_drop_all_mcast_groups(struct rxe_qp *qp)
 		list_del(&elem->qp_list);
 		grp->num_qp--;
 		spin_unlock_bh(&grp->mcg_lock);
-		rxe_drop_ref(grp);
-		rxe_drop_ref(elem);
+		rxe_drop_ref(&grp->pelem);
+		rxe_drop_ref(&elem->pelem);
 	}
 }
 
@@ -185,6 +185,6 @@ void rxe_mc_cleanup(struct rxe_pool_entry *arg)
 	struct rxe_mc_grp *grp = container_of(arg, typeof(*grp), pelem);
 	struct rxe_dev *rxe = grp->rxe;
 
-	rxe_drop_key(grp);
+	rxe_drop_key(&grp->pelem);
 	rxe_mcast_delete(rxe, &grp->mgid);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index ea6a819b7167..441b01e30afa 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -470,7 +470,7 @@ int copy_data(
 
 		if (offset >= sge->length) {
 			if (mem) {
-				rxe_drop_ref(mem);
+				rxe_drop_ref(&mem->pelem);
 				mem = NULL;
 			}
 			sge++;
@@ -515,13 +515,13 @@ int copy_data(
 	dma->resid	= resid;
 
 	if (mem)
-		rxe_drop_ref(mem);
+		rxe_drop_ref(&mem->pelem);
 
 	return 0;
 
 err2:
 	if (mem)
-		rxe_drop_ref(mem);
+		rxe_drop_ref(&mem->pelem);
 err1:
 	return err;
 }
@@ -581,7 +581,7 @@ struct rxe_mem *lookup_mem(struct rxe_pd *pd, int access, u32 key,
 		     mem->pd != pd ||
 		     (access && !(access & mem->access)) ||
 		     mem->state != RXE_MEM_STATE_VALID)) {
-		rxe_drop_ref(mem);
+		rxe_drop_ref(&mem->pelem);
 		mem = NULL;
 	}
 
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index e50f19fadcf9..f8c3604bc5ad 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -416,7 +416,7 @@ static void rxe_skb_tx_dtor(struct sk_buff *skb)
 		     skb_out < RXE_INFLIGHT_SKBS_PER_QP_LOW))
 		rxe_run_task(&qp->req.task);
 
-	rxe_drop_ref(qp);
+	rxe_drop_ref(&qp->pelem);
 }
 
 int rxe_send(struct rxe_pkt_info *pkt, struct sk_buff *skb)
@@ -426,7 +426,7 @@ int rxe_send(struct rxe_pkt_info *pkt, struct sk_buff *skb)
 	skb->destructor = rxe_skb_tx_dtor;
 	skb->sk = pkt->qp->sk->sk;
 
-	rxe_add_ref(pkt->qp);
+	rxe_add_ref(&pkt->qp->pelem);
 	atomic_inc(&pkt->qp->skb_out);
 
 	if (skb->protocol == htons(ETH_P_IP)) {
@@ -436,7 +436,7 @@ int rxe_send(struct rxe_pkt_info *pkt, struct sk_buff *skb)
 	} else {
 		pr_err("Unknown layer 3 protocol: %d\n", skb->protocol);
 		atomic_dec(&pkt->qp->skb_out);
-		rxe_drop_ref(pkt->qp);
+		rxe_drop_ref(&pkt->qp->pelem);
 		kfree_skb(skb);
 		return -EINVAL;
 	}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index fbcbac52290b..efa9bab01e02 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -338,9 +338,8 @@ static void insert_key(struct rxe_pool *pool, struct rxe_pool_entry *new)
 	return;
 }
 
-void rxe_add_key(void *arg, void *key)
+void rxe_add_key(struct rxe_pool_entry *elem, void *key)
 {
-	struct rxe_pool_entry *elem = arg;
 	struct rxe_pool *pool = elem->pool;
 	unsigned long flags;
 
@@ -350,9 +349,8 @@ void rxe_add_key(void *arg, void *key)
 	write_unlock_irqrestore(&pool->pool_lock, flags);
 }
 
-void rxe_drop_key(void *arg)
+void rxe_drop_key(struct rxe_pool_entry *elem)
 {
-	struct rxe_pool_entry *elem = arg;
 	struct rxe_pool *pool = elem->pool;
 	unsigned long flags;
 
@@ -361,9 +359,8 @@ void rxe_drop_key(void *arg)
 	write_unlock_irqrestore(&pool->pool_lock, flags);
 }
 
-void rxe_add_index(void *arg)
+void rxe_add_index(struct rxe_pool_entry *elem)
 {
-	struct rxe_pool_entry *elem = arg;
 	struct rxe_pool *pool = elem->pool;
 	unsigned long flags;
 
@@ -373,9 +370,8 @@ void rxe_add_index(void *arg)
 	write_unlock_irqrestore(&pool->pool_lock, flags);
 }
 
-void rxe_drop_index(void *arg)
+void rxe_drop_index(struct rxe_pool_entry *elem)
 {
-	struct rxe_pool_entry *elem = arg;
 	struct rxe_pool *pool = elem->pool;
 	unsigned long flags;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 2f2cff1cbe43..5c6a9429f541 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -138,18 +138,18 @@ int rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_entry *elem);
 /* assign an index to an indexed object and insert object into
  *  pool's rb tree
  */
-void rxe_add_index(void *elem);
+void rxe_add_index(struct rxe_pool_entry *elem);
 
 /* drop an index and remove object from rb tree */
-void rxe_drop_index(void *elem);
+void rxe_drop_index(struct rxe_pool_entry *elem);
 
 /* assign a key to a keyed object and insert object into
  *  pool's rb tree
  */
-void rxe_add_key(void *elem, void *key);
+void rxe_add_key(struct rxe_pool_entry *elem, void *key);
 
 /* remove elem from rb tree */
-void rxe_drop_key(void *elem);
+void rxe_drop_key(struct rxe_pool_entry *elem);
 
 /* lookup an indexed object from index. takes a reference on object */
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index);
@@ -161,9 +161,13 @@ void *rxe_pool_get_key(struct rxe_pool *pool, void *key);
 void rxe_elem_release(struct kref *kref);
 
 /* take a reference on an object */
-#define rxe_add_ref(elem) kref_get(&(elem)->pelem.ref_cnt)
+static inline void rxe_add_ref(struct rxe_pool_entry *pelem) {
+	kref_get(&pelem->ref_cnt);
+}
 
 /* drop a reference on an object */
-#define rxe_drop_ref(elem) kref_put(&(elem)->pelem.ref_cnt, rxe_elem_release)
+static inline void rxe_drop_ref(struct rxe_pool_entry *pelem) {
+	kref_put(&pelem->ref_cnt, rxe_elem_release);
+}
 
 #endif /* RXE_POOL_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 623f44f1d1d5..7cd929185581 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -152,11 +152,11 @@ static void free_rd_atomic_resources(struct rxe_qp *qp)
 void free_rd_atomic_resource(struct rxe_qp *qp, struct resp_res *res)
 {
 	if (res->type == RXE_ATOMIC_MASK) {
-		rxe_drop_ref(qp);
+		rxe_drop_ref(&qp->pelem);
 		kfree_skb(res->atomic.skb);
 	} else if (res->type == RXE_READ_MASK) {
 		if (res->read.mr)
-			rxe_drop_ref(res->read.mr);
+			rxe_drop_ref(&res->read.mr->pelem);
 	}
 	res->type = 0;
 }
@@ -344,11 +344,11 @@ int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd,
 	struct rxe_cq *scq = to_rcq(init->send_cq);
 	struct rxe_srq *srq = init->srq ? to_rsrq(init->srq) : NULL;
 
-	rxe_add_ref(pd);
-	rxe_add_ref(rcq);
-	rxe_add_ref(scq);
+	rxe_add_ref(&pd->pelem);
+	rxe_add_ref(&rcq->pelem);
+	rxe_add_ref(&scq->pelem);
 	if (srq)
-		rxe_add_ref(srq);
+		rxe_add_ref(&srq->pelem);
 
 	qp->pd			= pd;
 	qp->rcq			= rcq;
@@ -374,10 +374,10 @@ int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd,
 	rxe_queue_cleanup(qp->sq.queue);
 err1:
 	if (srq)
-		rxe_drop_ref(srq);
-	rxe_drop_ref(scq);
-	rxe_drop_ref(rcq);
-	rxe_drop_ref(pd);
+		rxe_drop_ref(&srq->pelem);
+	rxe_drop_ref(&scq->pelem);
+	rxe_drop_ref(&rcq->pelem);
+	rxe_drop_ref(&pd->pelem);
 
 	return err;
 }
@@ -536,7 +536,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
 	qp->resp.sent_psn_nak = 0;
 
 	if (qp->resp.mr) {
-		rxe_drop_ref(qp->resp.mr);
+		rxe_drop_ref(&qp->resp.mr->pelem);
 		qp->resp.mr = NULL;
 	}
 
@@ -812,20 +812,20 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
 		rxe_queue_cleanup(qp->sq.queue);
 
 	if (qp->srq)
-		rxe_drop_ref(qp->srq);
+		rxe_drop_ref(&qp->srq->pelem);
 
 	if (qp->rq.queue)
 		rxe_queue_cleanup(qp->rq.queue);
 
 	if (qp->scq)
-		rxe_drop_ref(qp->scq);
+		rxe_drop_ref(&qp->scq->pelem);
 	if (qp->rcq)
-		rxe_drop_ref(qp->rcq);
+		rxe_drop_ref(&qp->rcq->pelem);
 	if (qp->pd)
-		rxe_drop_ref(qp->pd);
+		rxe_drop_ref(&qp->pd->pelem);
 
 	if (qp->resp.mr) {
-		rxe_drop_ref(qp->resp.mr);
+		rxe_drop_ref(&qp->resp.mr->pelem);
 		qp->resp.mr = NULL;
 	}
 
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index f9a492ed900b..bd8dc133a722 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -261,7 +261,7 @@ static int hdr_check(struct rxe_pkt_info *pkt)
 	return 0;
 
 err2:
-	rxe_drop_ref(qp);
+	rxe_drop_ref(&qp->pelem);
 err1:
 	return -EINVAL;
 }
@@ -316,13 +316,13 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
 			skb_get(skb);
 
 		pkt->qp = qp;
-		rxe_add_ref(qp);
+		rxe_add_ref(&qp->pelem);
 		rxe_rcv_pkt(pkt, skb);
 	}
 
 	spin_unlock_bh(&mcg->mcg_lock);
 
-	rxe_drop_ref(mcg);	/* drop ref from rxe_pool_get_key. */
+	rxe_drop_ref(&mcg->pelem);	/* drop ref from rxe_pool_get_key. */
 
 err1:
 	kfree_skb(skb);
@@ -415,7 +415,7 @@ void rxe_rcv(struct sk_buff *skb)
 
 drop:
 	if (pkt->qp)
-		rxe_drop_ref(pkt->qp);
+		rxe_drop_ref(&pkt->qp->pelem);
 
 	kfree_skb(skb);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index a84c0407545b..0d018adbe512 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -594,7 +594,7 @@ int rxe_requester(void *arg)
 	struct rxe_send_wqe rollback_wqe;
 	u32 rollback_psn;
 
-	rxe_add_ref(qp);
+	rxe_add_ref(&qp->pelem);
 
 next_wqe:
 	if (unlikely(!qp->valid || qp->req.state == QP_STATE_ERROR))
@@ -633,7 +633,7 @@ int rxe_requester(void *arg)
 				goto exit;
 			}
 			rmr->state = RXE_MEM_STATE_FREE;
-			rxe_drop_ref(rmr);
+			rxe_drop_ref(&rmr->pelem);
 			wqe->state = wqe_state_done;
 			wqe->status = IB_WC_SUCCESS;
 		} else if (wqe->wr.opcode == IB_WR_REG_MR) {
@@ -702,7 +702,7 @@ int rxe_requester(void *arg)
 			wqe->state = wqe_state_done;
 			wqe->status = IB_WC_SUCCESS;
 			__rxe_do_task(&qp->comp.task);
-			rxe_drop_ref(qp);
+			rxe_drop_ref(&qp->pelem);
 			return 0;
 		}
 		payload = mtu;
@@ -753,6 +753,6 @@ int rxe_requester(void *arg)
 	__rxe_do_task(&qp->comp.task);
 
 exit:
-	rxe_drop_ref(qp);
+	rxe_drop_ref(&qp->pelem);
 	return -EAGAIN;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index d4b5535b8517..f038bae1bd70 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -118,7 +118,7 @@ static inline enum resp_states get_req(struct rxe_qp *qp,
 
 	if (qp->resp.state == QP_STATE_ERROR) {
 		while ((skb = skb_dequeue(&qp->req_pkts))) {
-			rxe_drop_ref(qp);
+			rxe_drop_ref(&qp->pelem);
 			kfree_skb(skb);
 		}
 
@@ -494,7 +494,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 
 err:
 	if (mem)
-		rxe_drop_ref(mem);
+		rxe_drop_ref(&mem->pelem);
 	return state;
 }
 
@@ -910,7 +910,7 @@ static enum resp_states do_complete(struct rxe_qp *qp,
 					return RESPST_ERROR;
 				}
 				rmr->state = RXE_MEM_STATE_FREE;
-				rxe_drop_ref(rmr);
+				rxe_drop_ref(&rmr->pelem);
 			}
 
 			wc->qp			= &qp->ibqp;
@@ -980,7 +980,7 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 		goto out;
 	}
 
-	rxe_add_ref(qp);
+	rxe_add_ref(&qp->pelem);
 
 	res = &qp->resp.resources[qp->resp.res_head];
 	free_rd_atomic_resource(qp, res);
@@ -1000,7 +1000,7 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 	rc = rxe_xmit_packet(qp, &ack_pkt, skb);
 	if (rc) {
 		pr_err_ratelimited("Failed sending ack\n");
-		rxe_drop_ref(qp);
+		rxe_drop_ref(&qp->pelem);
 	}
 out:
 	return rc;
@@ -1029,12 +1029,12 @@ static enum resp_states cleanup(struct rxe_qp *qp,
 
 	if (pkt) {
 		skb = skb_dequeue(&qp->req_pkts);
-		rxe_drop_ref(qp);
+		rxe_drop_ref(&qp->pelem);
 		kfree_skb(skb);
 	}
 
 	if (qp->resp.mr) {
-		rxe_drop_ref(qp->resp.mr);
+		rxe_drop_ref(&qp->resp.mr->pelem);
 		qp->resp.mr = NULL;
 	}
 
@@ -1180,7 +1180,7 @@ static enum resp_states do_class_d1e_error(struct rxe_qp *qp)
 		}
 
 		if (qp->resp.mr) {
-			rxe_drop_ref(qp->resp.mr);
+			rxe_drop_ref(&qp->resp.mr->pelem);
 			qp->resp.mr = NULL;
 		}
 
@@ -1193,7 +1193,7 @@ static void rxe_drain_req_pkts(struct rxe_qp *qp, bool notify)
 	struct sk_buff *skb;
 
 	while ((skb = skb_dequeue(&qp->req_pkts))) {
-		rxe_drop_ref(qp);
+		rxe_drop_ref(&qp->pelem);
 		kfree_skb(skb);
 	}
 
@@ -1212,7 +1212,7 @@ int rxe_responder(void *arg)
 	struct rxe_pkt_info *pkt = NULL;
 	int ret = 0;
 
-	rxe_add_ref(qp);
+	rxe_add_ref(&qp->pelem);
 
 	qp->resp.aeth_syndrome = AETH_ACK_UNLIMITED;
 
@@ -1392,6 +1392,6 @@ int rxe_responder(void *arg)
 exit:
 	ret = -EAGAIN;
 done:
-	rxe_drop_ref(qp);
+	rxe_drop_ref(&qp->pelem);
 	return ret;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index f6b25b409d12..545eff108070 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -154,7 +154,7 @@ static void rxe_dealloc_ucontext(struct ib_ucontext *ibuc)
 {
 	struct rxe_ucontext *uc = to_ruc(ibuc);
 
-	rxe_drop_ref(uc);
+	rxe_drop_ref(&uc->pelem);
 }
 
 static int rxe_port_immutable(struct ib_device *dev, u8 port_num,
@@ -188,7 +188,7 @@ static void rxe_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
 {
 	struct rxe_pd *pd = to_rpd(ibpd);
 
-	rxe_drop_ref(pd);
+	rxe_drop_ref(&pd->pelem);
 }
 
 static int rxe_create_ah(struct ib_ah *ibah, struct rdma_ah_attr *attr,
@@ -239,7 +239,7 @@ static void rxe_destroy_ah(struct ib_ah *ibah, u32 flags)
 {
 	struct rxe_ah *ah = to_rah(ibah);
 
-	rxe_drop_ref(ah);
+	rxe_drop_ref(&ah->pelem);
 }
 
 static int post_one_recv(struct rxe_rq *rq, const struct ib_recv_wr *ibwr)
@@ -312,7 +312,7 @@ static int rxe_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init,
 	if (err)
 		goto err1;
 
-	rxe_add_ref(pd);
+	rxe_add_ref(&pd->pelem);
 	srq->pd = pd;
 
 	err = rxe_srq_from_init(rxe, srq, init, udata, uresp);
@@ -322,8 +322,8 @@ static int rxe_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init,
 	return 0;
 
 err2:
-	rxe_drop_ref(pd);
-	rxe_drop_ref(srq);
+	rxe_drop_ref(&pd->pelem);
+	rxe_drop_ref(&srq->pelem);
 err1:
 	return err;
 }
@@ -380,8 +380,8 @@ static void rxe_destroy_srq(struct ib_srq *ibsrq, struct ib_udata *udata)
 	if (srq->rq.queue)
 		rxe_queue_cleanup(srq->rq.queue);
 
-	rxe_drop_ref(srq->pd);
-	rxe_drop_ref(srq);
+	rxe_drop_ref(&srq->pd->pelem);
+	rxe_drop_ref(&srq->pelem);
 }
 
 static int rxe_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr,
@@ -442,7 +442,7 @@ static struct ib_qp *rxe_create_qp(struct ib_pd *ibpd,
 		qp->is_user = 1;
 	}
 
-	rxe_add_index(qp);
+	rxe_add_index(&qp->pelem);
 
 	err = rxe_qp_from_init(rxe, qp, pd, init, uresp, ibpd, udata);
 	if (err)
@@ -451,9 +451,9 @@ static struct ib_qp *rxe_create_qp(struct ib_pd *ibpd,
 	return &qp->ibqp;
 
 err3:
-	rxe_drop_index(qp);
+	rxe_drop_index(&qp->pelem);
 err2:
-	rxe_drop_ref(qp);
+	rxe_drop_ref(&qp->pelem);
 err1:
 	return ERR_PTR(err);
 }
@@ -495,8 +495,8 @@ static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 	struct rxe_qp *qp = to_rqp(ibqp);
 
 	rxe_qp_destroy(qp);
-	rxe_drop_index(qp);
-	rxe_drop_ref(qp);
+	rxe_drop_index(&qp->pelem);
+	rxe_drop_ref(&qp->pelem);
 	return 0;
 }
 
@@ -814,7 +814,7 @@ static void rxe_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
 
 	rxe_cq_disable(cq);
 
-	rxe_drop_ref(cq);
+	rxe_drop_ref(&cq->pelem);
 }
 
 static int rxe_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
@@ -904,9 +904,9 @@ static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access)
 		goto err1;
 	}
 
-	rxe_add_index(mr);
+	rxe_add_index(&mr->pelem);
 
-	rxe_add_ref(pd);
+	rxe_add_ref(&pd->pelem);
 
 	err = rxe_mem_init_dma(pd, access, mr);
 	if (err)
@@ -915,9 +915,9 @@ static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access)
 	return &mr->ibmr;
 
 err2:
-	rxe_drop_ref(pd);
-	rxe_drop_index(mr);
-	rxe_drop_ref(mr);
+	rxe_drop_ref(&pd->pelem);
+	rxe_drop_index(&mr->pelem);
+	rxe_drop_ref(&mr->pelem);
 err1:
 	return ERR_PTR(err);
 }
@@ -939,9 +939,9 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 		goto err2;
 	}
 
-	rxe_add_index(mr);
+	rxe_add_index(&mr->pelem);
 
-	rxe_add_ref(pd);
+	rxe_add_ref(&pd->pelem);
 
 	err = rxe_mem_init_user(pd, start, length, iova,
 				access, udata, mr);
@@ -951,9 +951,9 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 	return &mr->ibmr;
 
 err3:
-	rxe_drop_ref(pd);
-	rxe_drop_index(mr);
-	rxe_drop_ref(mr);
+	rxe_drop_ref(&pd->pelem);
+	rxe_drop_index(&mr->pelem);
+	rxe_drop_ref(&mr->pelem);
 err2:
 	return ERR_PTR(err);
 }
@@ -963,9 +963,9 @@ static int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 	struct rxe_mem *mr = to_rmr(ibmr);
 
 	mr->state = RXE_MEM_STATE_ZOMBIE;
-	rxe_drop_ref(mr->pd);
-	rxe_drop_index(mr);
-	rxe_drop_ref(mr);
+	rxe_drop_ref(&mr->pd->pelem);
+	rxe_drop_index(&mr->pelem);
+	rxe_drop_ref(&mr->pelem);
 	return 0;
 }
 
@@ -986,9 +986,9 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
 		goto err1;
 	}
 
-	rxe_add_index(mr);
+	rxe_add_index(&mr->pelem);
 
-	rxe_add_ref(pd);
+	rxe_add_ref(&pd->pelem);
 
 	err = rxe_mem_init_fast(pd, max_num_sg, mr);
 	if (err)
@@ -997,9 +997,9 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
 	return &mr->ibmr;
 
 err2:
-	rxe_drop_ref(pd);
-	rxe_drop_index(mr);
-	rxe_drop_ref(mr);
+	rxe_drop_ref(&pd->pelem);
+	rxe_drop_index(&mr->pelem);
+	rxe_drop_ref(&mr->pelem);
 err1:
 	return ERR_PTR(err);
 }
@@ -1057,7 +1057,7 @@ static int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
 
 	err = rxe_mcast_add_grp_elem(rxe, qp, grp);
 
-	rxe_drop_ref(grp);
+	rxe_drop_ref(&grp->pelem);
 	return err;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 04/10] Protect kref_put with the lock
  2019-07-22 15:14 [PATCH 00/10] Refactor rxe driver to remove multiple race conditions Maksym Planeta
                   ` (2 preceding siblings ...)
  2019-07-22 15:14 ` [PATCH 03/10] Make pool interface more type safe Maksym Planeta
@ 2019-07-22 15:14 ` Maksym Planeta
  2019-07-22 15:25   ` Jason Gunthorpe
  2019-07-22 15:14 ` [PATCH 05/10] Fix reference counting for rxe tasklets Maksym Planeta
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
  To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma, linux-kernel
  Cc: Maksym Planeta

Need to ensure that kref_put does not run concurrently with the loop
inside rxe_pool_get_key.

Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 18 ++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_pool.h |  4 +---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index efa9bab01e02..30a887cf9200 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -536,3 +536,21 @@ void *rxe_pool_get_key(struct rxe_pool *pool, void *key)
 	read_unlock_irqrestore(&pool->pool_lock, flags);
 	return node ? elem : NULL;
 }
+
+static void rxe_dummy_release(struct kref *kref)
+{
+}
+
+void rxe_drop_ref(struct rxe_pool_entry *pelem)
+{
+	int res;
+	struct rxe_pool *pool = pelem->pool;
+	unsigned long flags;
+
+	write_lock_irqsave(&pool->pool_lock, flags);
+	res = kref_put(&pelem->ref_cnt, rxe_dummy_release);
+	write_unlock_irqrestore(&pool->pool_lock, flags);
+	if (res) {
+		rxe_elem_release(&pelem->ref_cnt);
+	}
+}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 5c6a9429f541..b90cc84c5511 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -166,8 +166,6 @@ static inline void rxe_add_ref(struct rxe_pool_entry *pelem) {
 }
 
 /* drop a reference on an object */
-static inline void rxe_drop_ref(struct rxe_pool_entry *pelem) {
-	kref_put(&pelem->ref_cnt, rxe_elem_release);
-}
+void rxe_drop_ref(struct rxe_pool_entry *pelem);
 
 #endif /* RXE_POOL_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 05/10] Fix reference counting for rxe tasklets
  2019-07-22 15:14 [PATCH 00/10] Refactor rxe driver to remove multiple race conditions Maksym Planeta
                   ` (3 preceding siblings ...)
  2019-07-22 15:14 ` [PATCH 04/10] Protect kref_put with the lock Maksym Planeta
@ 2019-07-22 15:14 ` Maksym Planeta
  2019-07-22 15:27   ` Jason Gunthorpe
  2019-07-22 15:14 ` [PATCH 06/10] Remove pd form rxe_ah Maksym Planeta
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
  To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma, linux-kernel
  Cc: Maksym Planeta

Reference should be aquired *before* the tasklet is run. The best time
to increment the reference counter is at initialisation. Otherwise, the
object may not exists anymore by the time tasklet is run.

Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
 drivers/infiniband/sw/rxe/rxe_comp.c | 4 ----
 drivers/infiniband/sw/rxe/rxe_req.c  | 4 ----
 drivers/infiniband/sw/rxe/rxe_resp.c | 3 ---
 drivers/infiniband/sw/rxe/rxe_task.c | 8 ++++++--
 drivers/infiniband/sw/rxe/rxe_task.h | 2 +-
 5 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index bdeb288673f3..3a1a33286f87 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -557,8 +557,6 @@ int rxe_completer(void *arg)
 	struct rxe_pkt_info *pkt = NULL;
 	enum comp_state state;
 
-	rxe_add_ref(&qp->pelem);
-
 	if (!qp->valid || qp->req.state == QP_STATE_ERROR ||
 	    qp->req.state == QP_STATE_RESET) {
 		rxe_drain_resp_pkts(qp, qp->valid &&
@@ -780,7 +778,6 @@ int rxe_completer(void *arg)
 	 * exit from the loop calling us
 	 */
 	WARN_ON_ONCE(skb);
-	rxe_drop_ref(&qp->pelem);
 	return -EAGAIN;
 
 done:
@@ -788,6 +785,5 @@ int rxe_completer(void *arg)
 	 * us again to see if there is anything else to do
 	 */
 	WARN_ON_ONCE(skb);
-	rxe_drop_ref(&qp->pelem);
 	return 0;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 0d018adbe512..c63e66873757 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -594,8 +594,6 @@ int rxe_requester(void *arg)
 	struct rxe_send_wqe rollback_wqe;
 	u32 rollback_psn;
 
-	rxe_add_ref(&qp->pelem);
-
 next_wqe:
 	if (unlikely(!qp->valid || qp->req.state == QP_STATE_ERROR))
 		goto exit;
@@ -702,7 +700,6 @@ int rxe_requester(void *arg)
 			wqe->state = wqe_state_done;
 			wqe->status = IB_WC_SUCCESS;
 			__rxe_do_task(&qp->comp.task);
-			rxe_drop_ref(&qp->pelem);
 			return 0;
 		}
 		payload = mtu;
@@ -753,6 +750,5 @@ int rxe_requester(void *arg)
 	__rxe_do_task(&qp->comp.task);
 
 exit:
-	rxe_drop_ref(&qp->pelem);
 	return -EAGAIN;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index f038bae1bd70..7be8a362d2ef 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -1212,8 +1212,6 @@ int rxe_responder(void *arg)
 	struct rxe_pkt_info *pkt = NULL;
 	int ret = 0;
 
-	rxe_add_ref(&qp->pelem);
-
 	qp->resp.aeth_syndrome = AETH_ACK_UNLIMITED;
 
 	if (!qp->valid) {
@@ -1392,6 +1390,5 @@ int rxe_responder(void *arg)
 exit:
 	ret = -EAGAIN;
 done:
-	rxe_drop_ref(&qp->pelem);
 	return ret;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 7c2b6e4595f5..9d6b8ad08a3a 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -35,6 +35,7 @@
 #include <linux/interrupt.h>
 #include <linux/hardirq.h>
 
+#include "rxe.h"
 #include "rxe_task.h"
 
 int __rxe_do_task(struct rxe_task *task)
@@ -115,14 +116,15 @@ void rxe_do_task(unsigned long data)
 }
 
 int rxe_init_task(void *obj, struct rxe_task *task,
-		  void *arg, int (*func)(void *), char *name)
+		  struct rxe_qp *qp, int (*func)(void *), char *name)
 {
 	task->obj	= obj;
-	task->arg	= arg;
+	task->arg	= qp;
 	task->func	= func;
 	snprintf(task->name, sizeof(task->name), "%s", name);
 	task->destroyed	= false;
 
+	rxe_add_ref(&qp->pelem);
 	tasklet_init(&task->tasklet, rxe_do_task, (unsigned long)task);
 
 	task->state = TASK_STATE_START;
@@ -135,6 +137,7 @@ void rxe_cleanup_task(struct rxe_task *task)
 {
 	unsigned long flags;
 	bool idle;
+	struct rxe_qp *qp = (struct rxe_qp *)task->arg;
 
 	/*
 	 * Mark the task, then wait for it to finish. It might be
@@ -149,6 +152,7 @@ void rxe_cleanup_task(struct rxe_task *task)
 	} while (!idle);
 
 	tasklet_kill(&task->tasklet);
+	rxe_drop_ref(&qp->pelem);
 }
 
 void rxe_run_task(struct rxe_task *task)
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index 5c1fc7d5b953..671b1774b577 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -63,7 +63,7 @@ struct rxe_task {
  *	fcn  => function to call until it returns != 0
  */
 int rxe_init_task(void *obj, struct rxe_task *task,
-		  void *arg, int (*func)(void *), char *name);
+		  struct rxe_qp *qp, int (*func)(void *), char *name);
 
 /* cleanup task */
 void rxe_cleanup_task(struct rxe_task *task);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 06/10] Remove pd form rxe_ah
  2019-07-22 15:14 [PATCH 00/10] Refactor rxe driver to remove multiple race conditions Maksym Planeta
                   ` (4 preceding siblings ...)
  2019-07-22 15:14 ` [PATCH 05/10] Fix reference counting for rxe tasklets Maksym Planeta
@ 2019-07-22 15:14 ` Maksym Planeta
  2019-07-22 15:14 ` [PATCH 07/10] Pass the return value of kref_put further Maksym Planeta
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
  To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma, linux-kernel
  Cc: Maksym Planeta

Pointer to PD is not used in rxe_ah anymore.

Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
 drivers/infiniband/sw/rxe/rxe_verbs.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 5c4b2239129c..8dd65c2a7c72 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -73,7 +73,6 @@ struct rxe_pd {
 struct rxe_ah {
 	struct ib_ah		ibah;
 	struct rxe_pool_entry	pelem;
-	struct rxe_pd		*pd;
 	struct rxe_av		av;
 };
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 07/10] Pass the return value of kref_put further
  2019-07-22 15:14 [PATCH 00/10] Refactor rxe driver to remove multiple race conditions Maksym Planeta
                   ` (5 preceding siblings ...)
  2019-07-22 15:14 ` [PATCH 06/10] Remove pd form rxe_ah Maksym Planeta
@ 2019-07-22 15:14 ` Maksym Planeta
  2019-07-22 15:29   ` Jason Gunthorpe
  2019-07-22 15:14 ` [PATCH 08/10] Move responsibility of cleaning up pool elements Maksym Planeta
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
  To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma, linux-kernel
  Cc: Maksym Planeta

Used in a later patch.

Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 3 ++-
 drivers/infiniband/sw/rxe/rxe_pool.h | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 30a887cf9200..711d7d7f3692 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -541,7 +541,7 @@ static void rxe_dummy_release(struct kref *kref)
 {
 }
 
-void rxe_drop_ref(struct rxe_pool_entry *pelem)
+int rxe_drop_ref(struct rxe_pool_entry *pelem)
 {
 	int res;
 	struct rxe_pool *pool = pelem->pool;
@@ -553,4 +553,5 @@ void rxe_drop_ref(struct rxe_pool_entry *pelem)
 	if (res) {
 		rxe_elem_release(&pelem->ref_cnt);
 	}
+	return res;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index b90cc84c5511..1e3508c74dc0 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -166,6 +166,6 @@ static inline void rxe_add_ref(struct rxe_pool_entry *pelem) {
 }
 
 /* drop a reference on an object */
-void rxe_drop_ref(struct rxe_pool_entry *pelem);
+int rxe_drop_ref(struct rxe_pool_entry *pelem);
 
 #endif /* RXE_POOL_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 08/10] Move responsibility of cleaning up pool elements
  2019-07-22 15:14 [PATCH 00/10] Refactor rxe driver to remove multiple race conditions Maksym Planeta
                   ` (6 preceding siblings ...)
  2019-07-22 15:14 ` [PATCH 07/10] Pass the return value of kref_put further Maksym Planeta
@ 2019-07-22 15:14 ` Maksym Planeta
  2019-07-22 15:14 ` [PATCH 09/10] Consolidate resetting of QP's tasks into one place Maksym Planeta
  2019-07-22 15:14 ` [PATCH 10/10] Replace tasklets with workqueues Maksym Planeta
  9 siblings, 0 replies; 22+ messages in thread
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
  To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma, linux-kernel
  Cc: Maksym Planeta

Specific implementations must finish off the cleanup of pool elements
when it is the right moment. Reason for that is that a concreate cleanup
function may want postpone and schedule part of object destruction to
later time.

Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
 drivers/infiniband/sw/rxe/rxe_cq.c    | 2 ++
 drivers/infiniband/sw/rxe/rxe_mcast.c | 2 ++
 drivers/infiniband/sw/rxe/rxe_mr.c    | 2 ++
 drivers/infiniband/sw/rxe/rxe_pool.c  | 9 ++++++++-
 drivers/infiniband/sw/rxe/rxe_pool.h  | 2 ++
 5 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_cq.c b/drivers/infiniband/sw/rxe/rxe_cq.c
index ad3090131126..5693986c8c1b 100644
--- a/drivers/infiniband/sw/rxe/rxe_cq.c
+++ b/drivers/infiniband/sw/rxe/rxe_cq.c
@@ -182,4 +182,6 @@ void rxe_cq_cleanup(struct rxe_pool_entry *arg)
 
 	if (cq->queue)
 		rxe_queue_cleanup(cq->queue);
+
+	rxe_elem_cleanup(arg);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 24ebc4ca1b99..6453ae97653f 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -187,4 +187,6 @@ void rxe_mc_cleanup(struct rxe_pool_entry *arg)
 
 	rxe_drop_key(&grp->pelem);
 	rxe_mcast_delete(rxe, &grp->mgid);
+
+	rxe_elem_cleanup(arg);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 441b01e30afa..1c0940655df1 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -104,6 +104,8 @@ void rxe_mem_cleanup(struct rxe_pool_entry *arg)
 
 		kfree(mem->map);
 	}
+
+	rxe_elem_cleanup(arg);
 }
 
 static int rxe_mem_alloc(struct rxe_mem *mem, int num_buf)
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 711d7d7f3692..3b14ab599662 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -462,9 +462,16 @@ void rxe_elem_release(struct kref *kref)
 
 	if (pool->cleanup)
 		pool->cleanup(elem);
+	else
+		rxe_elem_cleanup(elem);
+}
+
+void rxe_elem_cleanup(struct rxe_pool_entry *pelem)
+{
+	struct rxe_pool *pool = pelem->pool;
 
 	if (!(pool->flags & RXE_POOL_NO_ALLOC))
-		kmem_cache_free(pool_cache(pool), elem);
+		kmem_cache_free(pool_cache(pool), pelem);
 	atomic_dec(&pool->num_elem);
 	ib_device_put(&pool->rxe->ib_dev);
 	rxe_pool_put(pool);
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 1e3508c74dc0..84cfbc749a5c 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -160,6 +160,8 @@ void *rxe_pool_get_key(struct rxe_pool *pool, void *key);
 /* cleanup an object when all references are dropped */
 void rxe_elem_release(struct kref *kref);
 
+void rxe_elem_cleanup(struct rxe_pool_entry *pelem);
+
 /* take a reference on an object */
 static inline void rxe_add_ref(struct rxe_pool_entry *pelem) {
 	kref_get(&pelem->ref_cnt);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 09/10] Consolidate resetting of QP's tasks into one place
  2019-07-22 15:14 [PATCH 00/10] Refactor rxe driver to remove multiple race conditions Maksym Planeta
                   ` (7 preceding siblings ...)
  2019-07-22 15:14 ` [PATCH 08/10] Move responsibility of cleaning up pool elements Maksym Planeta
@ 2019-07-22 15:14 ` Maksym Planeta
  2019-07-22 15:14 ` [PATCH 10/10] Replace tasklets with workqueues Maksym Planeta
  9 siblings, 0 replies; 22+ messages in thread
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
  To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma, linux-kernel
  Cc: Maksym Planeta

Used in a later patch.

Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
 drivers/infiniband/sw/rxe/rxe_loc.h  |  1 +
 drivers/infiniband/sw/rxe/rxe_qp.c   | 17 +----------------
 drivers/infiniband/sw/rxe/rxe_req.c  |  1 +
 drivers/infiniband/sw/rxe/rxe_resp.c | 25 ++++++++++++++-----------
 4 files changed, 17 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index b7159d9d5107..e37adde2bced 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -205,6 +205,7 @@ static inline int rcv_wqe_size(int max_sge)
 }
 
 void free_rd_atomic_resource(struct rxe_qp *qp, struct resp_res *res);
+void cleanup_rd_atomic_resources(struct rxe_qp *qp);
 
 static inline void rxe_advance_resp_resource(struct rxe_qp *qp)
 {
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 7cd929185581..2fccdede8869 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -161,7 +161,7 @@ void free_rd_atomic_resource(struct rxe_qp *qp, struct resp_res *res)
 	res->type = 0;
 }
 
-static void cleanup_rd_atomic_resources(struct rxe_qp *qp)
+void cleanup_rd_atomic_resources(struct rxe_qp *qp)
 {
 	int i;
 	struct resp_res *res;
@@ -526,21 +526,6 @@ static void rxe_qp_reset(struct rxe_qp *qp)
 
 	/* cleanup attributes */
 	atomic_set(&qp->ssn, 0);
-	qp->req.opcode = -1;
-	qp->req.need_retry = 0;
-	qp->req.noack_pkts = 0;
-	qp->resp.msn = 0;
-	qp->resp.opcode = -1;
-	qp->resp.drop_msg = 0;
-	qp->resp.goto_error = 0;
-	qp->resp.sent_psn_nak = 0;
-
-	if (qp->resp.mr) {
-		rxe_drop_ref(&qp->resp.mr->pelem);
-		qp->resp.mr = NULL;
-	}
-
-	cleanup_rd_atomic_resources(qp);
 
 	/* reenable tasks */
 	rxe_enable_task(&qp->resp.task);
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index c63e66873757..3bb9afdeaee1 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -599,6 +599,7 @@ int rxe_requester(void *arg)
 		goto exit;
 
 	if (unlikely(qp->req.state == QP_STATE_RESET)) {
+		qp->req.noack_pkts = 0;
 		qp->req.wqe_index = consumer_index(qp->sq.queue);
 		qp->req.opcode = -1;
 		qp->req.need_rd_atomic = 0;
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 7be8a362d2ef..eaee68d718ce 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -1214,19 +1214,10 @@ int rxe_responder(void *arg)
 
 	qp->resp.aeth_syndrome = AETH_ACK_UNLIMITED;
 
-	if (!qp->valid) {
-		ret = -EINVAL;
-		goto done;
-	}
-
-	switch (qp->resp.state) {
-	case QP_STATE_RESET:
+	if (!qp->valid || qp->resp.state == QP_STATE_RESET) {
 		state = RESPST_RESET;
-		break;
-
-	default:
+	} else {
 		state = RESPST_GET_REQ;
-		break;
 	}
 
 	while (1) {
@@ -1374,6 +1365,18 @@ int rxe_responder(void *arg)
 		case RESPST_RESET:
 			rxe_drain_req_pkts(qp, false);
 			qp->resp.wqe = NULL;
+			qp->resp.msn = 0;
+			qp->resp.opcode = -1;
+			qp->resp.drop_msg = 0;
+			qp->resp.goto_error = 0;
+			qp->resp.sent_psn_nak = 0;
+
+			if (qp->resp.mr) {
+				rxe_drop_ref(&qp->resp.mr->pelem);
+				qp->resp.mr = NULL;
+			}
+
+			cleanup_rd_atomic_resources(qp);
 			goto exit;
 
 		case RESPST_ERROR:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 10/10] Replace tasklets with workqueues
  2019-07-22 15:14 [PATCH 00/10] Refactor rxe driver to remove multiple race conditions Maksym Planeta
                   ` (8 preceding siblings ...)
  2019-07-22 15:14 ` [PATCH 09/10] Consolidate resetting of QP's tasks into one place Maksym Planeta
@ 2019-07-22 15:14 ` Maksym Planeta
  2019-07-22 15:32   ` Jason Gunthorpe
  9 siblings, 1 reply; 22+ messages in thread
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
  To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma, linux-kernel
  Cc: Maksym Planeta

Replace tasklets with workqueues in rxe driver.

Ensure that task is called only through a workqueue. This allows to
simplify task logic.

Add additional dependencies to make sure that cleanup tasks do not
happen after object's memory is already reclaimed.

Improve overal stability of the driver by removing multiple race
conditions and use-after-free situations.

Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
 drivers/infiniband/sw/rxe/rxe_cq.c    |  15 ++-
 drivers/infiniband/sw/rxe/rxe_net.c   |  17 +++-
 drivers/infiniband/sw/rxe/rxe_qp.c    |  73 ++++++++-------
 drivers/infiniband/sw/rxe/rxe_req.c   |   4 +-
 drivers/infiniband/sw/rxe/rxe_task.c  | 126 ++++++++------------------
 drivers/infiniband/sw/rxe/rxe_task.h  |  38 ++------
 drivers/infiniband/sw/rxe/rxe_verbs.c |   7 ++
 drivers/infiniband/sw/rxe/rxe_verbs.h |   7 +-
 8 files changed, 124 insertions(+), 163 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_cq.c b/drivers/infiniband/sw/rxe/rxe_cq.c
index 5693986c8c1b..6c3cf5ba2911 100644
--- a/drivers/infiniband/sw/rxe/rxe_cq.c
+++ b/drivers/infiniband/sw/rxe/rxe_cq.c
@@ -66,9 +66,9 @@ int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq,
 	return -EINVAL;
 }
 
-static void rxe_send_complete(unsigned long data)
+static void rxe_send_complete(struct work_struct *work)
 {
-	struct rxe_cq *cq = (struct rxe_cq *)data;
+	struct rxe_cq *cq = container_of(work, typeof(*cq), work);
 	unsigned long flags;
 
 	spin_lock_irqsave(&cq->cq_lock, flags);
@@ -106,8 +106,12 @@ int rxe_cq_from_init(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe,
 		cq->is_user = 1;
 
 	cq->is_dying = false;
+	INIT_WORK(&cq->work, rxe_send_complete);
 
-	tasklet_init(&cq->comp_task, rxe_send_complete, (unsigned long)cq);
+	cq->wq = alloc_ordered_workqueue("cq:send_complete", 0);
+	if (!cq->wq) {
+		return -ENOMEM;
+	}
 
 	spin_lock_init(&cq->cq_lock);
 	cq->ibcq.cqe = cqe;
@@ -161,7 +165,7 @@ int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
 	if ((cq->notify == IB_CQ_NEXT_COMP) ||
 	    (cq->notify == IB_CQ_SOLICITED && solicited)) {
 		cq->notify = 0;
-		tasklet_schedule(&cq->comp_task);
+		queue_work(cq->wq, &cq->work);
 	}
 
 	return 0;
@@ -183,5 +187,8 @@ void rxe_cq_cleanup(struct rxe_pool_entry *arg)
 	if (cq->queue)
 		rxe_queue_cleanup(cq->queue);
 
+	if (cq->wq)
+		destroy_workqueue(cq->wq);
+
 	rxe_elem_cleanup(arg);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index f8c3604bc5ad..2997edc27592 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -410,13 +410,20 @@ static void rxe_skb_tx_dtor(struct sk_buff *skb)
 {
 	struct sock *sk = skb->sk;
 	struct rxe_qp *qp = sk->sk_user_data;
-	int skb_out = atomic_dec_return(&qp->skb_out);
+	int skb_out = atomic_read(&qp->skb_out);
 
-	if (unlikely(qp->need_req_skb &&
-		     skb_out < RXE_INFLIGHT_SKBS_PER_QP_LOW))
-		rxe_run_task(&qp->req.task);
+	atomic_inc(&qp->pending_skb_down);
+	skb_out = atomic_read(&qp->pending_skb_down);
+	if (!rxe_drop_ref(&qp->pelem)) {
+		atomic_dec(&qp->pending_skb_down);
+		atomic_dec(&qp->skb_out);
 
-	rxe_drop_ref(&qp->pelem);
+		if (unlikely(qp->need_req_skb &&
+			     skb_out < RXE_INFLIGHT_SKBS_PER_QP_LOW))
+			rxe_run_task(&qp->req.task);
+
+		skb_out = atomic_read(&qp->pending_skb_down);
+	}
 }
 
 int rxe_send(struct rxe_pkt_info *pkt, struct sk_buff *skb)
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 2fccdede8869..2bb25360319e 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -214,6 +214,7 @@ static void rxe_qp_init_misc(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	atomic_set(&qp->ssn, 0);
 	atomic_set(&qp->skb_out, 0);
+	atomic_set(&qp->pending_skb_down, 0);
 }
 
 static int rxe_qp_init_req(struct rxe_dev *rxe, struct rxe_qp *qp,
@@ -332,6 +333,8 @@ static int rxe_qp_init_resp(struct rxe_dev *rxe, struct rxe_qp *qp,
 	return 0;
 }
 
+void rxe_qp_do_cleanup(struct work_struct *work);
+
 /* called by the create qp verb */
 int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd,
 		     struct ib_qp_init_attr *init,
@@ -368,6 +371,9 @@ int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd,
 	qp->attr.qp_state = IB_QPS_RESET;
 	qp->valid = 1;
 
+	INIT_WORK(&qp->cleanup_work, rxe_qp_do_cleanup);
+	qp->cleanup_completion = NULL;
+
 	return 0;
 
 err2:
@@ -499,16 +505,6 @@ int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp *qp,
 /* move the qp to the reset state */
 static void rxe_qp_reset(struct rxe_qp *qp)
 {
-	/* stop tasks from running */
-	rxe_disable_task(&qp->resp.task);
-
-	/* stop request/comp */
-	if (qp->sq.queue) {
-		if (qp_type(qp) == IB_QPT_RC)
-			rxe_disable_task(&qp->comp.task);
-		rxe_disable_task(&qp->req.task);
-	}
-
 	/* move qp to the reset state */
 	qp->req.state = QP_STATE_RESET;
 	qp->resp.state = QP_STATE_RESET;
@@ -516,26 +512,16 @@ static void rxe_qp_reset(struct rxe_qp *qp)
 	/* let state machines reset themselves drain work and packet queues
 	 * etc.
 	 */
-	__rxe_do_task(&qp->resp.task);
+	rxe_run_task_wait(&qp->req.task);
 
 	if (qp->sq.queue) {
-		__rxe_do_task(&qp->comp.task);
-		__rxe_do_task(&qp->req.task);
+		rxe_run_task_wait(&qp->comp.task);
+		rxe_run_task_wait(&qp->req.task);
 		rxe_queue_reset(qp->sq.queue);
 	}
 
 	/* cleanup attributes */
 	atomic_set(&qp->ssn, 0);
-
-	/* reenable tasks */
-	rxe_enable_task(&qp->resp.task);
-
-	if (qp->sq.queue) {
-		if (qp_type(qp) == IB_QPT_RC)
-			rxe_enable_task(&qp->comp.task);
-
-		rxe_enable_task(&qp->req.task);
-	}
 }
 
 /* drain the send queue */
@@ -547,7 +533,7 @@ static void rxe_qp_drain(struct rxe_qp *qp)
 			if (qp_type(qp) == IB_QPT_RC)
 				rxe_run_task(&qp->comp.task);
 			else
-				__rxe_do_task(&qp->comp.task);
+				rxe_run_task_wait(&qp->comp.task);
 			rxe_run_task(&qp->req.task);
 		}
 	}
@@ -566,7 +552,7 @@ void rxe_qp_error(struct rxe_qp *qp)
 	if (qp_type(qp) == IB_QPT_RC)
 		rxe_run_task(&qp->comp.task);
 	else
-		__rxe_do_task(&qp->comp.task);
+		rxe_run_task_wait(&qp->comp.task);
 	rxe_run_task(&qp->req.task);
 }
 
@@ -778,18 +764,22 @@ void rxe_qp_destroy(struct rxe_qp *qp)
 	rxe_cleanup_task(&qp->req.task);
 	rxe_cleanup_task(&qp->comp.task);
 
-	/* flush out any receive wr's or pending requests */
-	__rxe_do_task(&qp->req.task);
-	if (qp->sq.queue) {
-		__rxe_do_task(&qp->comp.task);
-		__rxe_do_task(&qp->req.task);
+	while (true) {
+		int skb_out;
+		skb_out = atomic_read(&qp->skb_out);
+		pr_debug("Waiting until %d skb's to flush at qp#%d\n", skb_out, qp_num(qp));
+		if (skb_out > 0)
+			msleep(10);
+		else
+			break;
 	}
 }
 
 /* called when the last reference to the qp is dropped */
-static void rxe_qp_do_cleanup(struct work_struct *work)
+void rxe_qp_do_cleanup(struct work_struct *work)
 {
-	struct rxe_qp *qp = container_of(work, typeof(*qp), cleanup_work.work);
+	int pending;
+	struct rxe_qp *qp = container_of(work, typeof(*qp), cleanup_work);
 
 	rxe_drop_all_mcast_groups(qp);
 
@@ -821,12 +811,25 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
 
 	kernel_sock_shutdown(qp->sk, SHUT_RDWR);
 	sock_release(qp->sk);
+
+	do {
+		pending = atomic_dec_return(&qp->pending_skb_down);
+		if (pending < 0) {
+			atomic_inc(&qp->pending_skb_down);
+			break;
+		}
+
+		atomic_dec(&qp->skb_out);
+	} while (pending);
+
+	BUG_ON(!qp->cleanup_completion);
+	complete(qp->cleanup_completion);
+
+	rxe_elem_cleanup(&qp->pelem);
 }
 
-/* called when the last reference to the qp is dropped */
 void rxe_qp_cleanup(struct rxe_pool_entry *arg)
 {
 	struct rxe_qp *qp = container_of(arg, typeof(*qp), pelem);
-
-	execute_in_process_context(rxe_qp_do_cleanup, &qp->cleanup_work);
+	queue_work(system_highpri_wq, &qp->cleanup_work);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 3bb9afdeaee1..829e37208b8e 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -700,7 +700,7 @@ int rxe_requester(void *arg)
 						       qp->req.wqe_index);
 			wqe->state = wqe_state_done;
 			wqe->status = IB_WC_SUCCESS;
-			__rxe_do_task(&qp->comp.task);
+			rxe_run_task_wait(&qp->comp.task);
 			return 0;
 		}
 		payload = mtu;
@@ -748,7 +748,7 @@ int rxe_requester(void *arg)
 err:
 	wqe->status = IB_WC_LOC_PROT_ERR;
 	wqe->state = wqe_state_error;
-	__rxe_do_task(&qp->comp.task);
+	rxe_run_task_wait(&qp->comp.task);
 
 exit:
 	return -EAGAIN;
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 9d6b8ad08a3a..df6920361865 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -38,81 +38,39 @@
 #include "rxe.h"
 #include "rxe_task.h"
 
-int __rxe_do_task(struct rxe_task *task)
-
+/*
+ * common function called by any of the main tasklets
+ * If there is any chance that there is additional
+ * work to do someone must reschedule the task before
+ * leaving
+ */
+static void rxe_do_task(struct work_struct *work)
 {
+	struct rxe_task *task = container_of(work, typeof(*task), work);
+	struct rxe_qp *qp = (struct rxe_qp *)task->arg;
 	int ret;
 
-	while ((ret = task->func(task->arg)) == 0)
-		;
+	do {
+		if (task->destroyed) {
+			pr_debug("Running a destroyed task %p\n", task);
+		}
 
-	task->ret = ret;
+		if (!qp->valid) {
+			pr_debug("Running a task %p with an invalid qp#%d\n", task, qp_num(qp));
+		}
+
+		ret = task->func(task->arg);
+	} while (!ret);
 
-	return ret;
 }
 
-/*
- * this locking is due to a potential race where
- * a second caller finds the task already running
- * but looks just after the last call to func
- */
-void rxe_do_task(unsigned long data)
+static void rxe_do_task_notify(struct work_struct *work)
 {
-	int cont;
-	int ret;
-	unsigned long flags;
-	struct rxe_task *task = (struct rxe_task *)data;
-
-	spin_lock_irqsave(&task->state_lock, flags);
-	switch (task->state) {
-	case TASK_STATE_START:
-		task->state = TASK_STATE_BUSY;
-		spin_unlock_irqrestore(&task->state_lock, flags);
-		break;
-
-	case TASK_STATE_BUSY:
-		task->state = TASK_STATE_ARMED;
-		/* fall through */
-	case TASK_STATE_ARMED:
-		spin_unlock_irqrestore(&task->state_lock, flags);
-		return;
-
-	default:
-		spin_unlock_irqrestore(&task->state_lock, flags);
-		pr_warn("%s failed with bad state %d\n", __func__, task->state);
-		return;
-	}
+	struct rxe_task *task = container_of(work, typeof(*task), wait_work);
 
-	do {
-		cont = 0;
-		ret = task->func(task->arg);
-
-		spin_lock_irqsave(&task->state_lock, flags);
-		switch (task->state) {
-		case TASK_STATE_BUSY:
-			if (ret)
-				task->state = TASK_STATE_START;
-			else
-				cont = 1;
-			break;
-
-		/* soneone tried to run the task since the last time we called
-		 * func, so we will call one more time regardless of the
-		 * return value
-		 */
-		case TASK_STATE_ARMED:
-			task->state = TASK_STATE_BUSY;
-			cont = 1;
-			break;
-
-		default:
-			pr_warn("%s failed with bad state %d\n", __func__,
-				task->state);
-		}
-		spin_unlock_irqrestore(&task->state_lock, flags);
-	} while (cont);
+	rxe_do_task(&task->work);
 
-	task->ret = ret;
+	complete_all(&task->completion);
 }
 
 int rxe_init_task(void *obj, struct rxe_task *task,
@@ -125,18 +83,21 @@ int rxe_init_task(void *obj, struct rxe_task *task,
 	task->destroyed	= false;
 
 	rxe_add_ref(&qp->pelem);
-	tasklet_init(&task->tasklet, rxe_do_task, (unsigned long)task);
+	init_completion(&task->completion);
+
+	INIT_WORK(&task->work, rxe_do_task);
+	INIT_WORK(&task->wait_work, rxe_do_task_notify);
 
-	task->state = TASK_STATE_START;
-	spin_lock_init(&task->state_lock);
+	task->wq = alloc_ordered_workqueue("qp#%d:%s", 0, qp_num(qp), name);
+	if (!task->wq) {
+		return -ENOMEM;
+	}
 
 	return 0;
 }
 
 void rxe_cleanup_task(struct rxe_task *task)
 {
-	unsigned long flags;
-	bool idle;
 	struct rxe_qp *qp = (struct rxe_qp *)task->arg;
 
 	/*
@@ -145,30 +106,23 @@ void rxe_cleanup_task(struct rxe_task *task)
 	 */
 	task->destroyed = true;
 
-	do {
-		spin_lock_irqsave(&task->state_lock, flags);
-		idle = (task->state == TASK_STATE_START);
-		spin_unlock_irqrestore(&task->state_lock, flags);
-	} while (!idle);
+	rxe_run_task(task);
+
+	destroy_workqueue(task->wq);
 
-	tasklet_kill(&task->tasklet);
 	rxe_drop_ref(&qp->pelem);
 }
 
 void rxe_run_task(struct rxe_task *task)
 {
-	if (task->destroyed)
-		return;
-
-	tasklet_schedule(&task->tasklet);
+	queue_work(task->wq, &task->work);
 }
 
-void rxe_disable_task(struct rxe_task *task)
+void rxe_run_task_wait(struct rxe_task *task)
 {
-	tasklet_disable(&task->tasklet);
-}
+	reinit_completion(&task->completion);
 
-void rxe_enable_task(struct rxe_task *task)
-{
-	tasklet_enable(&task->tasklet);
+	queue_work(task->wq, &task->wait_work);
+
+	wait_for_completion(&task->completion);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index 671b1774b577..c1568a05fb24 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -34,12 +34,6 @@
 #ifndef RXE_TASK_H
 #define RXE_TASK_H
 
-enum {
-	TASK_STATE_START	= 0,
-	TASK_STATE_BUSY		= 1,
-	TASK_STATE_ARMED	= 2,
-};
-
 /*
  * data structure to describe a 'task' which is a short
  * function that returns 0 as long as it needs to be
@@ -47,14 +41,14 @@ enum {
  */
 struct rxe_task {
 	void			*obj;
-	struct tasklet_struct	tasklet;
-	int			state;
-	spinlock_t		state_lock; /* spinlock for task state */
+	struct workqueue_struct	*wq;
 	void			*arg;
 	int			(*func)(void *arg);
-	int			ret;
 	char			name[16];
 	bool			destroyed;
+	struct work_struct	work;
+	struct work_struct	wait_work;
+	struct completion	completion;
 };
 
 /*
@@ -69,28 +63,14 @@ int rxe_init_task(void *obj, struct rxe_task *task,
 void rxe_cleanup_task(struct rxe_task *task);
 
 /*
- * raw call to func in loop without any checking
- * can call when tasklets are disabled
- */
-int __rxe_do_task(struct rxe_task *task);
-
-/*
- * common function called by any of the main tasklets
- * If there is any chance that there is additional
- * work to do someone must reschedule the task before
- * leaving
+ * schedule task to run on a workqueue.
  */
-void rxe_do_task(unsigned long data);
+void rxe_run_task(struct rxe_task *task);
 
 /*
- * schedule task to run as a tasklet.
+ * Run a task and wait until it completes. Recursive dependencies should be
+ * avoided.
  */
-void rxe_run_task(struct rxe_task *task);
-
-/* keep a task from scheduling */
-void rxe_disable_task(struct rxe_task *task);
-
-/* allow task to run */
-void rxe_enable_task(struct rxe_task *task);
+void rxe_run_task_wait(struct rxe_task *task);
 
 #endif /* RXE_TASK_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 545eff108070..ce54d44034f6 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -493,10 +493,17 @@ static int rxe_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 {
 	struct rxe_qp *qp = to_rqp(ibqp);
+	DECLARE_COMPLETION_ONSTACK(cleanup_completion);
+
+	BUG_ON(qp->cleanup_completion);
+	qp->cleanup_completion = &cleanup_completion;
 
 	rxe_qp_destroy(qp);
 	rxe_drop_index(&qp->pelem);
 	rxe_drop_ref(&qp->pelem);
+
+	wait_for_completion(&cleanup_completion);
+
 	return 0;
 }
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 8dd65c2a7c72..d022826a2895 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -91,7 +91,8 @@ struct rxe_cq {
 	u8			notify;
 	bool			is_dying;
 	int			is_user;
-	struct tasklet_struct	comp_task;
+	struct workqueue_struct	*wq;
+	struct work_struct	work;
 };
 
 enum wqe_state {
@@ -270,6 +271,7 @@ struct rxe_qp {
 
 	atomic_t		ssn;
 	atomic_t		skb_out;
+	atomic_t		pending_skb_down;
 	int			need_req_skb;
 
 	/* Timer for retranmitting packet when ACKs have been lost. RC
@@ -285,7 +287,8 @@ struct rxe_qp {
 
 	spinlock_t		state_lock; /* guard requester and completer */
 
-	struct execute_work	cleanup_work;
+	struct work_struct	cleanup_work;
+	struct completion	*cleanup_completion;
 };
 
 enum rxe_mem_state {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 04/10] Protect kref_put with the lock
  2019-07-22 15:14 ` [PATCH 04/10] Protect kref_put with the lock Maksym Planeta
@ 2019-07-22 15:25   ` Jason Gunthorpe
  2019-07-22 15:28     ` Maksym Planeta
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2019-07-22 15:25 UTC (permalink / raw)
  To: Maksym Planeta; +Cc: Moni Shoua, Doug Ledford, linux-rdma, linux-kernel

On Mon, Jul 22, 2019 at 05:14:20PM +0200, Maksym Planeta wrote:
> Need to ensure that kref_put does not run concurrently with the loop
> inside rxe_pool_get_key.
>
> Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
>  drivers/infiniband/sw/rxe/rxe_pool.c | 18 ++++++++++++++++++
>  drivers/infiniband/sw/rxe/rxe_pool.h |  4 +---
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> index efa9bab01e02..30a887cf9200 100644
> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> @@ -536,3 +536,21 @@ void *rxe_pool_get_key(struct rxe_pool *pool, void *key)
>  	read_unlock_irqrestore(&pool->pool_lock, flags);
>  	return node ? elem : NULL;
>  }
> +
> +static void rxe_dummy_release(struct kref *kref)
> +{
> +}
> +
> +void rxe_drop_ref(struct rxe_pool_entry *pelem)
> +{
> +	int res;
> +	struct rxe_pool *pool = pelem->pool;
> +	unsigned long flags;
> +
> +	write_lock_irqsave(&pool->pool_lock, flags);
> +	res = kref_put(&pelem->ref_cnt, rxe_dummy_release);
> +	write_unlock_irqrestore(&pool->pool_lock, flags);

This doesn't make sense..

If something is making the kref go to 0 while the node is still in the
RB tree then that is a bug.

You should never need to add locking around a kref_put.

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 05/10] Fix reference counting for rxe tasklets
  2019-07-22 15:14 ` [PATCH 05/10] Fix reference counting for rxe tasklets Maksym Planeta
@ 2019-07-22 15:27   ` Jason Gunthorpe
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2019-07-22 15:27 UTC (permalink / raw)
  To: Maksym Planeta; +Cc: Moni Shoua, Doug Ledford, linux-rdma, linux-kernel

On Mon, Jul 22, 2019 at 05:14:21PM +0200, Maksym Planeta wrote:
>  
>  int rxe_init_task(void *obj, struct rxe_task *task,
> -		  void *arg, int (*func)(void *), char *name)
> +		  struct rxe_qp *qp, int (*func)(void *), char *name)
>  {
>  	task->obj	= obj;
> -	task->arg	= arg;
> +	task->arg	= qp;
>  	task->func	= func;
>  	snprintf(task->name, sizeof(task->name), "%s", name);
>  	task->destroyed	= false;
>  
> +	rxe_add_ref(&qp->pelem);

Please put the kref incrs near the copy of the pointer. Those things
are logically related - copy the pointer, incr the kref.

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 04/10] Protect kref_put with the lock
  2019-07-22 15:25   ` Jason Gunthorpe
@ 2019-07-22 15:28     ` Maksym Planeta
  0 siblings, 0 replies; 22+ messages in thread
From: Maksym Planeta @ 2019-07-22 15:28 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Moni Shoua, Doug Ledford, linux-rdma, linux-kernel



On 22/07/2019 17:25, Jason Gunthorpe wrote:
> On Mon, Jul 22, 2019 at 05:14:20PM +0200, Maksym Planeta wrote:
>> Need to ensure that kref_put does not run concurrently with the loop
>> inside rxe_pool_get_key.
>>
>> Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
>>   drivers/infiniband/sw/rxe/rxe_pool.c | 18 ++++++++++++++++++
>>   drivers/infiniband/sw/rxe/rxe_pool.h |  4 +---
>>   2 files changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
>> index efa9bab01e02..30a887cf9200 100644
>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
>> @@ -536,3 +536,21 @@ void *rxe_pool_get_key(struct rxe_pool *pool, void *key)
>>   	read_unlock_irqrestore(&pool->pool_lock, flags);
>>   	return node ? elem : NULL;
>>   }
>> +
>> +static void rxe_dummy_release(struct kref *kref)
>> +{
>> +}
>> +
>> +void rxe_drop_ref(struct rxe_pool_entry *pelem)
>> +{
>> +	int res;
>> +	struct rxe_pool *pool = pelem->pool;
>> +	unsigned long flags;
>> +
>> +	write_lock_irqsave(&pool->pool_lock, flags);
>> +	res = kref_put(&pelem->ref_cnt, rxe_dummy_release);
>> +	write_unlock_irqrestore(&pool->pool_lock, flags);
> 
> This doesn't make sense..
> 
> If something is making the kref go to 0 while the node is still in the
> RB tree then that is a bug.
> 
> You should never need to add locking around a kref_put.
> 

 From https://www.kernel.org/doc/Documentation/kref.txt

| The last rule (rule 3) is the nastiest one to handle.  Say, for
| instance, you have a list of items that are each kref-ed, and you wish
| to get the first one.  You can't just pull the first item off the list
| and kref_get() it.  That violates rule 3 because you are not already
| holding a valid pointer.  You must add a mutex (or some other lock).


> Jason
> 

-- 
Regards,
Maksym Planeta

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 07/10] Pass the return value of kref_put further
  2019-07-22 15:14 ` [PATCH 07/10] Pass the return value of kref_put further Maksym Planeta
@ 2019-07-22 15:29   ` Jason Gunthorpe
  2019-07-22 15:31     ` Maksym Planeta
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2019-07-22 15:29 UTC (permalink / raw)
  To: Maksym Planeta; +Cc: Moni Shoua, Doug Ledford, linux-rdma, linux-kernel

On Mon, Jul 22, 2019 at 05:14:23PM +0200, Maksym Planeta wrote:
> Used in a later patch.
> 
> Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
>  drivers/infiniband/sw/rxe/rxe_pool.c | 3 ++-
>  drivers/infiniband/sw/rxe/rxe_pool.h | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> index 30a887cf9200..711d7d7f3692 100644
> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> @@ -541,7 +541,7 @@ static void rxe_dummy_release(struct kref *kref)
>  {
>  }
>  
> -void rxe_drop_ref(struct rxe_pool_entry *pelem)
> +int rxe_drop_ref(struct rxe_pool_entry *pelem)
>  {
>  	int res;
>  	struct rxe_pool *pool = pelem->pool;
> @@ -553,4 +553,5 @@ void rxe_drop_ref(struct rxe_pool_entry *pelem)
>  	if (res) {
>  		rxe_elem_release(&pelem->ref_cnt);
>  	}
> +	return res;
>  }

Using the return value of kref_put at all is super sketchy. Are you
sure this is actually a kref usage pattern?

Why would this be needed?

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 07/10] Pass the return value of kref_put further
  2019-07-22 15:29   ` Jason Gunthorpe
@ 2019-07-22 15:31     ` Maksym Planeta
  0 siblings, 0 replies; 22+ messages in thread
From: Maksym Planeta @ 2019-07-22 15:31 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Moni Shoua, Doug Ledford, linux-rdma, linux-kernel

In a later patch I need to know if the dependency to QP object still 
exists or not.

On 22/07/2019 17:29, Jason Gunthorpe wrote:
> On Mon, Jul 22, 2019 at 05:14:23PM +0200, Maksym Planeta wrote:
>> Used in a later patch.
>>
>> Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
>>   drivers/infiniband/sw/rxe/rxe_pool.c | 3 ++-
>>   drivers/infiniband/sw/rxe/rxe_pool.h | 2 +-
>>   2 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
>> index 30a887cf9200..711d7d7f3692 100644
>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
>> @@ -541,7 +541,7 @@ static void rxe_dummy_release(struct kref *kref)
>>   {
>>   }
>>   
>> -void rxe_drop_ref(struct rxe_pool_entry *pelem)
>> +int rxe_drop_ref(struct rxe_pool_entry *pelem)
>>   {
>>   	int res;
>>   	struct rxe_pool *pool = pelem->pool;
>> @@ -553,4 +553,5 @@ void rxe_drop_ref(struct rxe_pool_entry *pelem)
>>   	if (res) {
>>   		rxe_elem_release(&pelem->ref_cnt);
>>   	}
>> +	return res;
>>   }
> 
> Using the return value of kref_put at all is super sketchy. Are you
> sure this is actually a kref usage pattern?
> 
> Why would this be needed?
> 
> Jason
> 

-- 
Regards,
Maksym Planeta

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 10/10] Replace tasklets with workqueues
  2019-07-22 15:14 ` [PATCH 10/10] Replace tasklets with workqueues Maksym Planeta
@ 2019-07-22 15:32   ` Jason Gunthorpe
  2019-07-25 14:36     ` Maksym Planeta
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2019-07-22 15:32 UTC (permalink / raw)
  To: Maksym Planeta; +Cc: Moni Shoua, Doug Ledford, linux-rdma, linux-kernel

On Mon, Jul 22, 2019 at 05:14:26PM +0200, Maksym Planeta wrote:
> Replace tasklets with workqueues in rxe driver.
> 
> Ensure that task is called only through a workqueue. This allows to
> simplify task logic.
> 
> Add additional dependencies to make sure that cleanup tasks do not
> happen after object's memory is already reclaimed.
> 
> Improve overal stability of the driver by removing multiple race
> conditions and use-after-free situations.

This should be described more precisely

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 10/10] Replace tasklets with workqueues
  2019-07-22 15:32   ` Jason Gunthorpe
@ 2019-07-25 14:36     ` Maksym Planeta
  2019-07-25 18:50       ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Maksym Planeta @ 2019-07-25 14:36 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Moni Shoua, Doug Ledford, linux-rdma, linux-kernel

Is this one better?

Replace tasklets with workqueues in rxe driver. The reason for this 
replacement is that tasklets are supposed to run atomically, although 
the actual code may block.

Modify the SKB destructor for outgoing SKB's to schedule QP tasks only 
if the QP is not destroyed itself.

Add a variable "pending_skb_down" to ensure that reference counting for 
a QP is decremented only when QP access related to this skb is over.

Separate part of pool element cleanup code to allow this code to be 
called in the very end of cleanup, even if some of cleanup is scheduled 
for asynchronous execution. Example, when it was happening is destructor 
for a QP.

Disallow calling of task functions "directly". This allows to simplify 
logic inside rxe_task.c

Schedule rxe_qp_do_cleanup onto high-priority system workqueue, because 
this function can be scheduled from normal system workqueue.

Before destroying a QP, wait until all references to this QP are gone. 
Previously the problem was that outgoing SKBs could be freed after the 
QP these SKBs refer to is destroyed.

Add blocking rxe_run_task to replace __rxe_do_task that was calling task 
function directly.

On 22/07/2019 17:32, Jason Gunthorpe wrote:
> On Mon, Jul 22, 2019 at 05:14:26PM +0200, Maksym Planeta wrote:
>> Replace tasklets with workqueues in rxe driver.
>>
>> Ensure that task is called only through a workqueue. This allows to
>> simplify task logic.
>>
>> Add additional dependencies to make sure that cleanup tasks do not
>> happen after object's memory is already reclaimed.
>>
>> Improve overal stability of the driver by removing multiple race
>> conditions and use-after-free situations.
> 
> This should be described more precisely
> 
> Jason
> 

-- 
Regards,
Maksym Planeta

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 10/10] Replace tasklets with workqueues
  2019-07-25 14:36     ` Maksym Planeta
@ 2019-07-25 18:50       ` Jason Gunthorpe
  2019-07-30 19:20         ` Maksym Planeta
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2019-07-25 18:50 UTC (permalink / raw)
  To: Maksym Planeta; +Cc: Moni Shoua, Doug Ledford, linux-rdma, linux-kernel

On Thu, Jul 25, 2019 at 04:36:20PM +0200, Maksym Planeta wrote:
> Is this one better?
> 
> Replace tasklets with workqueues in rxe driver. The reason for this
> replacement is that tasklets are supposed to run atomically, although the
> actual code may block.
> 
> Modify the SKB destructor for outgoing SKB's to schedule QP tasks only if
> the QP is not destroyed itself.
> 
> Add a variable "pending_skb_down" to ensure that reference counting for a QP
> is decremented only when QP access related to this skb is over.
> 
> Separate part of pool element cleanup code to allow this code to be called
> in the very end of cleanup, even if some of cleanup is scheduled for
> asynchronous execution. Example, when it was happening is destructor for a
> QP.
> 
> Disallow calling of task functions "directly". This allows to simplify logic
> inside rxe_task.c
> 
> Schedule rxe_qp_do_cleanup onto high-priority system workqueue, because this
> function can be scheduled from normal system workqueue.
> 
> Before destroying a QP, wait until all references to this QP are gone.
> Previously the problem was that outgoing SKBs could be freed after the QP
> these SKBs refer to is destroyed.
> 
> Add blocking rxe_run_task to replace __rxe_do_task that was calling task
> function directly.

Mostly but it would also be good to describe the use after free and
races more specifically

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 10/10] Replace tasklets with workqueues
  2019-07-25 18:50       ` Jason Gunthorpe
@ 2019-07-30 19:20         ` Maksym Planeta
  2019-10-11 10:41           ` Maksym Planeta
  0 siblings, 1 reply; 22+ messages in thread
From: Maksym Planeta @ 2019-07-30 19:20 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Moni Shoua, Doug Ledford, linux-rdma, linux-kernel



On 25/07/2019 20:50, Jason Gunthorpe wrote:
> On Thu, Jul 25, 2019 at 04:36:20PM +0200, Maksym Planeta wrote:
>> Is this one better?
>>
>> Replace tasklets with workqueues in rxe driver. The reason for this
>> replacement is that tasklets are supposed to run atomically, although the
>> actual code may block.
>>
>> Modify the SKB destructor for outgoing SKB's to schedule QP tasks only if
>> the QP is not destroyed itself.
>>
>> Add a variable "pending_skb_down" to ensure that reference counting for a QP
>> is decremented only when QP access related to this skb is over.
>>
>> Separate part of pool element cleanup code to allow this code to be called
>> in the very end of cleanup, even if some of cleanup is scheduled for
>> asynchronous execution. Example, when it was happening is destructor for a
>> QP.
>>
>> Disallow calling of task functions "directly". This allows to simplify logic
>> inside rxe_task.c
>>
>> Schedule rxe_qp_do_cleanup onto high-priority system workqueue, because this
>> function can be scheduled from normal system workqueue.
>>
>> Before destroying a QP, wait until all references to this QP are gone.
>> Previously the problem was that outgoing SKBs could be freed after the QP
>> these SKBs refer to is destroyed.
>>
>> Add blocking rxe_run_task to replace __rxe_do_task that was calling task
>> function directly.
> 
> Mostly but it would also be good to describe the use after free and
> races more specifically
> 

These situations are described in the cover letter (PATCH 00/10). Do you 
need a more detailed description than that?

> Jason
> 

-- 
Regards,
Maksym Planeta

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 10/10] Replace tasklets with workqueues
  2019-07-30 19:20         ` Maksym Planeta
@ 2019-10-11 10:41           ` Maksym Planeta
  2019-10-22 19:04             ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Maksym Planeta @ 2019-10-11 10:41 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Moni Shoua, Doug Ledford, linux-rdma, linux-kernel

Hi,

this is a kind reminder regarding the patchset. I added description of 
races in the original email.

On 30/07/2019 21:20, Maksym Planeta wrote:
> 
> 
> On 25/07/2019 20:50, Jason Gunthorpe wrote:
>> On Thu, Jul 25, 2019 at 04:36:20PM +0200, Maksym Planeta wrote:
>>> Is this one better?
>>>
>>> Replace tasklets with workqueues in rxe driver. The reason for this
>>> replacement is that tasklets are supposed to run atomically, although 
>>> the
>>> actual code may block.
>>>
>>> Modify the SKB destructor for outgoing SKB's to schedule QP tasks 
>>> only if
>>> the QP is not destroyed itself.
>>>
>>> Add a variable "pending_skb_down" to ensure that reference counting 
>>> for a QP
>>> is decremented only when QP access related to this skb is over.
>>>
>>> Separate part of pool element cleanup code to allow this code to be 
>>> called
>>> in the very end of cleanup, even if some of cleanup is scheduled for
>>> asynchronous execution. Example, when it was happening is destructor 
>>> for a
>>> QP.
>>>
>>> Disallow calling of task functions "directly". This allows to 
>>> simplify logic
>>> inside rxe_task.c
>>>
>>> Schedule rxe_qp_do_cleanup onto high-priority system workqueue, 
>>> because this
>>> function can be scheduled from normal system workqueue.
>>>
>>> Before destroying a QP, wait until all references to this QP are gone.
>>> Previously the problem was that outgoing SKBs could be freed after 
>>> the QP
>>> these SKBs refer to is destroyed.
>>>
>>> Add blocking rxe_run_task to replace __rxe_do_task that was calling task
>>> function directly.
>>
>> Mostly but it would also be good to describe the use after free and
>> races more specifically
>>
> 
> These situations are described in the cover letter (PATCH 00/10). Do you 
> need a more detailed description than that?
> 
>> Jason
>>
> 

-- 
Regards,
Maksym Planeta

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 10/10] Replace tasklets with workqueues
  2019-10-11 10:41           ` Maksym Planeta
@ 2019-10-22 19:04             ` Jason Gunthorpe
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2019-10-22 19:04 UTC (permalink / raw)
  To: Maksym Planeta; +Cc: Moni Shoua, Doug Ledford, linux-rdma, linux-kernel

On Fri, Oct 11, 2019 at 12:41:50PM +0200, Maksym Planeta wrote:
> Hi,
> 
> this is a kind reminder regarding the patchset. I added description of races
> in the original email.

Your patch isn't on patchworks any more, you will need to send a v2
with the updated commit descriptions

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-10-22 19:04 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-22 15:14 [PATCH 00/10] Refactor rxe driver to remove multiple race conditions Maksym Planeta
2019-07-22 15:14 ` [PATCH 01/10] Simplify rxe_run_task interface Maksym Planeta
2019-07-22 15:14 ` [PATCH 02/10] Remove counter that does not have a meaning anymore Maksym Planeta
2019-07-22 15:14 ` [PATCH 03/10] Make pool interface more type safe Maksym Planeta
2019-07-22 15:14 ` [PATCH 04/10] Protect kref_put with the lock Maksym Planeta
2019-07-22 15:25   ` Jason Gunthorpe
2019-07-22 15:28     ` Maksym Planeta
2019-07-22 15:14 ` [PATCH 05/10] Fix reference counting for rxe tasklets Maksym Planeta
2019-07-22 15:27   ` Jason Gunthorpe
2019-07-22 15:14 ` [PATCH 06/10] Remove pd form rxe_ah Maksym Planeta
2019-07-22 15:14 ` [PATCH 07/10] Pass the return value of kref_put further Maksym Planeta
2019-07-22 15:29   ` Jason Gunthorpe
2019-07-22 15:31     ` Maksym Planeta
2019-07-22 15:14 ` [PATCH 08/10] Move responsibility of cleaning up pool elements Maksym Planeta
2019-07-22 15:14 ` [PATCH 09/10] Consolidate resetting of QP's tasks into one place Maksym Planeta
2019-07-22 15:14 ` [PATCH 10/10] Replace tasklets with workqueues Maksym Planeta
2019-07-22 15:32   ` Jason Gunthorpe
2019-07-25 14:36     ` Maksym Planeta
2019-07-25 18:50       ` Jason Gunthorpe
2019-07-30 19:20         ` Maksym Planeta
2019-10-11 10:41           ` Maksym Planeta
2019-10-22 19:04             ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.