All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next 00/16] Implement work queues for rdma_rxe
@ 2022-10-18  4:33 Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 01/16] RDMA/rxe: Remove init of task locks from rxe_qp.c Bob Pearson
                   ` (16 more replies)
  0 siblings, 17 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

This patch series implements work queues as an alternative for
the main tasklets in the rdma_rxe driver. The first few patches
perform some cleanups of the current tasklet code followed by a
patch that makes the internal API for task execution pluggable and
implements an inline and a tasklet based set of functions.
The remaining patches cleanup the qp reset and error code in the
three tasklets and modify the locking logic to prevent making
multiple calls to the tasklet scheduling routine. Finally after
this preparation the work queue equivalent set of functions is
added and module parameters are implemented to allow tuning the
task types.

The advantages of the work queue version of deferred task execution
is mainly that the work queue variant has much better scalability
and overall performance than the tasklet variant. The tasklet
performance saturates with one connected queue pair and stays constant.
The work queue performance is slightly better for one queue pair but
scales up with the number of connected queue pairs. The perftest
microbenchmarks in local loopback mode (not a very realistic test
case) can reach approximately 100Gb/sec with work queues compared to
about 16Gb/sec for tasklets.

This patch series is derived from an earlier patch set developed by
Ian Ziemba at HPE which is used in some Lustre storage clients attached
to Lustre servers with hard RoCE v2 NICs.

Bob Pearson (16):
  RDMA/rxe: Remove init of task locks from rxe_qp.c
  RDMA/rxe: Removed unused name from rxe_task struct
  RDMA/rxe: Split rxe_run_task() into two subroutines
  RDMA/rxe: Make rxe_do_task static
  RDMA/rxe: Rename task->state_lock to task->lock
  RDMA/rxe: Make task interface pluggable
  RDMA/rxe: Simplify reset state handling in rxe_resp.c
  RDMA/rxe: Split rxe_drain_resp_pkts()
  RDMA/rxe: Handle qp error in rxe_resp.c
  RDMA/rxe: Cleanup comp tasks in rxe_qp.c
  RDMA/rxe: Remove __rxe_do_task()
  RDMA/rxe: Make tasks schedule each other
  RDMA/rxe: Implement disable/enable_task()
  RDMA/rxe: Replace TASK_STATE_START by TASK_STATE_IDLE
  RDMA/rxe: Add workqueue support for tasks
  RDMA/rxe: Add parameters to control task type

 drivers/infiniband/sw/rxe/rxe.c       |   9 +-
 drivers/infiniband/sw/rxe/rxe_comp.c  |  35 ++-
 drivers/infiniband/sw/rxe/rxe_net.c   |   4 +-
 drivers/infiniband/sw/rxe/rxe_qp.c    |  87 +++----
 drivers/infiniband/sw/rxe/rxe_req.c   |  10 +-
 drivers/infiniband/sw/rxe/rxe_resp.c  |  75 ++++--
 drivers/infiniband/sw/rxe/rxe_task.c  | 354 ++++++++++++++++++++------
 drivers/infiniband/sw/rxe/rxe_task.h  |  76 +++---
 drivers/infiniband/sw/rxe/rxe_verbs.c |   8 +-
 9 files changed, 451 insertions(+), 207 deletions(-)


base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780
-- 
2.34.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH for-next 01/16] RDMA/rxe: Remove init of task locks from rxe_qp.c
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 02/16] RDMA/rxe: Removed unused name from rxe_task struct Bob Pearson
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

The calls to spin_lock_init() for the tasklet spinlocks in
rxe_qp_init_misc() are redundant since they are intiialized in
rxe_init_task().  This patch removes them.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index a62bab88415c..57c3f05ad15b 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -172,10 +172,6 @@ static void rxe_qp_init_misc(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	spin_lock_init(&qp->state_lock);
 
-	spin_lock_init(&qp->req.task.state_lock);
-	spin_lock_init(&qp->resp.task.state_lock);
-	spin_lock_init(&qp->comp.task.state_lock);
-
 	spin_lock_init(&qp->sq.sq_lock);
 	spin_lock_init(&qp->rq.producer_lock);
 	spin_lock_init(&qp->rq.consumer_lock);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH for-next 02/16] RDMA/rxe: Removed unused name from rxe_task struct
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 01/16] RDMA/rxe: Remove init of task locks from rxe_qp.c Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 03/16] RDMA/rxe: Split rxe_run_task() into two subroutines Bob Pearson
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

The name field in struct rxe_task is never used. This patch removes it.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c   | 9 +++------
 drivers/infiniband/sw/rxe/rxe_task.c | 4 +---
 drivers/infiniband/sw/rxe/rxe_task.h | 4 +---
 3 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 57c3f05ad15b..03bd9f3e9956 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -238,10 +238,8 @@ static int rxe_qp_init_req(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	skb_queue_head_init(&qp->req_pkts);
 
-	rxe_init_task(&qp->req.task, qp,
-		      rxe_requester, "req");
-	rxe_init_task(&qp->comp.task, qp,
-		      rxe_completer, "comp");
+	rxe_init_task(&qp->req.task, qp, rxe_requester);
+	rxe_init_task(&qp->comp.task, qp, rxe_completer);
 
 	qp->qp_timeout_jiffies = 0; /* Can't be set for UD/UC in modify_qp */
 	if (init->qp_type == IB_QPT_RC) {
@@ -288,8 +286,7 @@ static int rxe_qp_init_resp(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	skb_queue_head_init(&qp->resp_pkts);
 
-	rxe_init_task(&qp->resp.task, qp,
-		      rxe_responder, "resp");
+	rxe_init_task(&qp->resp.task, qp, rxe_responder);
 
 	qp->resp.opcode		= OPCODE_NONE;
 	qp->resp.msn		= 0;
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index ec2b7de1c497..182d0532a8ab 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -94,12 +94,10 @@ void rxe_do_task(struct tasklet_struct *t)
 	task->ret = ret;
 }
 
-int rxe_init_task(struct rxe_task *task,
-		  void *arg, int (*func)(void *), char *name)
+int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *))
 {
 	task->arg	= arg;
 	task->func	= func;
-	snprintf(task->name, sizeof(task->name), "%s", name);
 	task->destroyed	= false;
 
 	tasklet_setup(&task->tasklet, rxe_do_task);
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index 7f612a1c68a7..b3dfd970d1dc 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -25,7 +25,6 @@ struct rxe_task {
 	void			*arg;
 	int			(*func)(void *arg);
 	int			ret;
-	char			name[16];
 	bool			destroyed;
 };
 
@@ -34,8 +33,7 @@ struct rxe_task {
  *	arg  => parameter to pass to fcn
  *	func => function to call until it returns != 0
  */
-int rxe_init_task(struct rxe_task *task,
-		  void *arg, int (*func)(void *), char *name);
+int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *));
 
 /* cleanup task */
 void rxe_cleanup_task(struct rxe_task *task);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH for-next 03/16] RDMA/rxe: Split rxe_run_task() into two subroutines
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 01/16] RDMA/rxe: Remove init of task locks from rxe_qp.c Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 02/16] RDMA/rxe: Removed unused name from rxe_task struct Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-18  4:33 ` [PATCH 04/16] for-next RDMA/rxe: Make rxe_do_task static Bob Pearson
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

Split rxe_run_task(task, sched) into rxe_run_task(task) and
rxe_sched_task(task).

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c  | 19 +++++++++++--------
 drivers/infiniband/sw/rxe/rxe_net.c   |  4 ++--
 drivers/infiniband/sw/rxe/rxe_qp.c    | 10 +++++-----
 drivers/infiniband/sw/rxe/rxe_req.c   | 10 +++++-----
 drivers/infiniband/sw/rxe/rxe_resp.c  |  5 ++++-
 drivers/infiniband/sw/rxe/rxe_task.c  | 15 ++++++++++-----
 drivers/infiniband/sw/rxe/rxe_task.h  |  7 +++----
 drivers/infiniband/sw/rxe/rxe_verbs.c |  8 ++++----
 8 files changed, 44 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index fb0c008af78c..d2a250123617 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -118,7 +118,7 @@ void retransmit_timer(struct timer_list *t)
 
 	if (qp->valid) {
 		qp->comp.timeout = 1;
-		rxe_run_task(&qp->comp.task, 1);
+		rxe_sched_task(&qp->comp.task);
 	}
 }
 
@@ -132,7 +132,10 @@ void rxe_comp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb)
 	if (must_sched != 0)
 		rxe_counter_inc(SKB_TO_PKT(skb)->rxe, RXE_CNT_COMPLETER_SCHED);
 
-	rxe_run_task(&qp->comp.task, must_sched);
+	if (must_sched)
+		rxe_sched_task(&qp->comp.task);
+	else
+		rxe_run_task(&qp->comp.task);
 }
 
 static inline enum comp_state get_wqe(struct rxe_qp *qp,
@@ -305,7 +308,7 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
 					qp->comp.psn = pkt->psn;
 					if (qp->req.wait_psn) {
 						qp->req.wait_psn = 0;
-						rxe_run_task(&qp->req.task, 0);
+						rxe_run_task(&qp->req.task);
 					}
 				}
 				return COMPST_ERROR_RETRY;
@@ -452,7 +455,7 @@ static void do_complete(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 	 */
 	if (qp->req.wait_fence) {
 		qp->req.wait_fence = 0;
-		rxe_run_task(&qp->req.task, 0);
+		rxe_run_task(&qp->req.task);
 	}
 }
 
@@ -466,7 +469,7 @@ static inline enum comp_state complete_ack(struct rxe_qp *qp,
 		if (qp->req.need_rd_atomic) {
 			qp->comp.timeout_retry = 0;
 			qp->req.need_rd_atomic = 0;
-			rxe_run_task(&qp->req.task, 0);
+			rxe_run_task(&qp->req.task);
 		}
 	}
 
@@ -512,7 +515,7 @@ static inline enum comp_state complete_wqe(struct rxe_qp *qp,
 
 		if (qp->req.wait_psn) {
 			qp->req.wait_psn = 0;
-			rxe_run_task(&qp->req.task, 1);
+			rxe_sched_task(&qp->req.task);
 		}
 	}
 
@@ -646,7 +649,7 @@ int rxe_completer(void *arg)
 
 			if (qp->req.wait_psn) {
 				qp->req.wait_psn = 0;
-				rxe_run_task(&qp->req.task, 1);
+				rxe_sched_task(&qp->req.task);
 			}
 
 			state = COMPST_DONE;
@@ -714,7 +717,7 @@ int rxe_completer(void *arg)
 							RXE_CNT_COMP_RETRY);
 					qp->req.need_retry = 1;
 					qp->comp.started_retry = 1;
-					rxe_run_task(&qp->req.task, 0);
+					rxe_run_task(&qp->req.task);
 				}
 				goto done;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 35f327b9d4b8..c36cad9c7a66 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -345,7 +345,7 @@ static void rxe_skb_tx_dtor(struct sk_buff *skb)
 
 	if (unlikely(qp->need_req_skb &&
 		     skb_out < RXE_INFLIGHT_SKBS_PER_QP_LOW))
-		rxe_run_task(&qp->req.task, 1);
+		rxe_sched_task(&qp->req.task);
 
 	rxe_put(qp);
 }
@@ -429,7 +429,7 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 	if ((qp_type(qp) != IB_QPT_RC) &&
 	    (pkt->mask & RXE_END_MASK)) {
 		pkt->wqe->state = wqe_state_done;
-		rxe_run_task(&qp->comp.task, 1);
+		rxe_sched_task(&qp->comp.task);
 	}
 
 	rxe_counter_inc(rxe, RXE_CNT_SENT_PKTS);
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 03bd9f3e9956..3f6d62a80bea 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -536,10 +536,10 @@ static void rxe_qp_drain(struct rxe_qp *qp)
 		if (qp->req.state != QP_STATE_DRAINED) {
 			qp->req.state = QP_STATE_DRAIN;
 			if (qp_type(qp) == IB_QPT_RC)
-				rxe_run_task(&qp->comp.task, 1);
+				rxe_sched_task(&qp->comp.task);
 			else
 				__rxe_do_task(&qp->comp.task);
-			rxe_run_task(&qp->req.task, 1);
+			rxe_sched_task(&qp->req.task);
 		}
 	}
 }
@@ -553,13 +553,13 @@ void rxe_qp_error(struct rxe_qp *qp)
 	qp->attr.qp_state = IB_QPS_ERR;
 
 	/* drain work and packet queues */
-	rxe_run_task(&qp->resp.task, 1);
+	rxe_sched_task(&qp->resp.task);
 
 	if (qp_type(qp) == IB_QPT_RC)
-		rxe_run_task(&qp->comp.task, 1);
+		rxe_sched_task(&qp->comp.task);
 	else
 		__rxe_do_task(&qp->comp.task);
-	rxe_run_task(&qp->req.task, 1);
+	rxe_sched_task(&qp->req.task);
 }
 
 /* called by the modify qp verb */
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index f63771207970..41f1d84f0acb 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -105,7 +105,7 @@ void rnr_nak_timer(struct timer_list *t)
 	/* request a send queue retry */
 	qp->req.need_retry = 1;
 	qp->req.wait_for_rnr_timer = 0;
-	rxe_run_task(&qp->req.task, 1);
+	rxe_sched_task(&qp->req.task);
 }
 
 static struct rxe_send_wqe *req_next_wqe(struct rxe_qp *qp)
@@ -608,7 +608,7 @@ static int rxe_do_local_ops(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 	 * which can lead to a deadlock. So go ahead and complete
 	 * it now.
 	 */
-	rxe_run_task(&qp->comp.task, 1);
+	rxe_sched_task(&qp->comp.task);
 
 	return 0;
 }
@@ -733,7 +733,7 @@ int rxe_requester(void *arg)
 						       qp->req.wqe_index);
 			wqe->state = wqe_state_done;
 			wqe->status = IB_WC_SUCCESS;
-			rxe_run_task(&qp->comp.task, 0);
+			rxe_run_task(&qp->comp.task);
 			goto done;
 		}
 		payload = mtu;
@@ -795,7 +795,7 @@ int rxe_requester(void *arg)
 		rollback_state(wqe, qp, &rollback_wqe, rollback_psn);
 
 		if (err == -EAGAIN) {
-			rxe_run_task(&qp->req.task, 1);
+			rxe_sched_task(&qp->req.task);
 			goto exit;
 		}
 
@@ -817,7 +817,7 @@ int rxe_requester(void *arg)
 	qp->req.wqe_index = queue_next_index(qp->sq.queue, qp->req.wqe_index);
 	wqe->state = wqe_state_error;
 	qp->req.state = QP_STATE_ERROR;
-	rxe_run_task(&qp->comp.task, 0);
+	rxe_run_task(&qp->comp.task);
 exit:
 	ret = -EAGAIN;
 out:
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index ed5a09e86417..2a2a50de51b2 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -91,7 +91,10 @@ void rxe_resp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb)
 	must_sched = (pkt->opcode == IB_OPCODE_RC_RDMA_READ_REQUEST) ||
 			(skb_queue_len(&qp->req_pkts) > 1);
 
-	rxe_run_task(&qp->resp.task, must_sched);
+	if (must_sched)
+		rxe_sched_task(&qp->resp.task);
+	else
+		rxe_run_task(&qp->resp.task);
 }
 
 static inline enum resp_states get_req(struct rxe_qp *qp,
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 182d0532a8ab..446ee2c3d381 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -127,15 +127,20 @@ void rxe_cleanup_task(struct rxe_task *task)
 	tasklet_kill(&task->tasklet);
 }
 
-void rxe_run_task(struct rxe_task *task, int sched)
+void rxe_run_task(struct rxe_task *task)
 {
 	if (task->destroyed)
 		return;
 
-	if (sched)
-		tasklet_schedule(&task->tasklet);
-	else
-		rxe_do_task(&task->tasklet);
+	rxe_do_task(&task->tasklet);
+}
+
+void rxe_sched_task(struct rxe_task *task)
+{
+	if (task->destroyed)
+		return;
+
+	tasklet_schedule(&task->tasklet);
 }
 
 void rxe_disable_task(struct rxe_task *task)
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index b3dfd970d1dc..590b1c1d7e7c 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -52,10 +52,9 @@ int __rxe_do_task(struct rxe_task *task);
  */
 void rxe_do_task(struct tasklet_struct *t);
 
-/* run a task, else schedule it to run as a tasklet, The decision
- * to run or schedule tasklet is based on the parameter sched.
- */
-void rxe_run_task(struct rxe_task *task, int sched);
+void rxe_run_task(struct rxe_task *task);
+
+void rxe_sched_task(struct rxe_task *task);
 
 /* keep a task from scheduling */
 void rxe_disable_task(struct rxe_task *task);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 88825edc7dce..f2f82efbaf6d 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -695,9 +695,9 @@ static int rxe_post_send_kernel(struct rxe_qp *qp, const struct ib_send_wr *wr,
 		wr = next;
 	}
 
-	rxe_run_task(&qp->req.task, 1);
+	rxe_sched_task(&qp->req.task);
 	if (unlikely(qp->req.state == QP_STATE_ERROR))
-		rxe_run_task(&qp->comp.task, 1);
+		rxe_sched_task(&qp->comp.task);
 
 	return err;
 }
@@ -719,7 +719,7 @@ static int rxe_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr,
 
 	if (qp->is_user) {
 		/* Utilize process context to do protocol processing */
-		rxe_run_task(&qp->req.task, 0);
+		rxe_run_task(&qp->req.task);
 		return 0;
 	} else
 		return rxe_post_send_kernel(qp, wr, bad_wr);
@@ -759,7 +759,7 @@ static int rxe_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
 	spin_unlock_irqrestore(&rq->producer_lock, flags);
 
 	if (qp->resp.state == QP_STATE_ERROR)
-		rxe_run_task(&qp->resp.task, 1);
+		rxe_sched_task(&qp->resp.task);
 
 err1:
 	return err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/16] for-next RDMA/rxe: Make rxe_do_task static
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
                   ` (2 preceding siblings ...)
  2022-10-18  4:33 ` [PATCH for-next 03/16] RDMA/rxe: Split rxe_run_task() into two subroutines Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-19  9:39   ` matsuda-daisuke
  2022-10-18  4:33 ` [PATCH for-next 05/16] RDMA/rxe: Rename task->state_lock to task->lock Bob Pearson
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

The subroutine rxe_do_task() is only called in rxe_task.c. This patch
makes it static and renames it do_task().

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_task.c | 2 +-
 drivers/infiniband/sw/rxe/rxe_task.h | 8 --------
 2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 446ee2c3d381..097ddb16c230 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -28,7 +28,7 @@ int __rxe_do_task(struct rxe_task *task)
  * a second caller finds the task already running
  * but looks just after the last call to func
  */
-void rxe_do_task(struct tasklet_struct *t)
+static void rxe_do_task(struct tasklet_struct *t)
 {
 	int cont;
 	int ret;
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index 590b1c1d7e7c..99e0173e5c46 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -44,14 +44,6 @@ void rxe_cleanup_task(struct rxe_task *task);
  */
 int __rxe_do_task(struct rxe_task *task);
 
-/*
- * common function called by any of the main tasklets
- * If there is any chance that there is additional
- * work to do someone must reschedule the task before
- * leaving
- */
-void rxe_do_task(struct tasklet_struct *t);
-
 void rxe_run_task(struct rxe_task *task);
 
 void rxe_sched_task(struct rxe_task *task);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH for-next 05/16] RDMA/rxe: Rename task->state_lock to task->lock
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
                   ` (3 preceding siblings ...)
  2022-10-18  4:33 ` [PATCH 04/16] for-next RDMA/rxe: Make rxe_do_task static Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 06/16] RDMA/rxe: Make task interface pluggable Bob Pearson
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

Rename task-state_lock to task->lock

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_task.c | 18 +++++++++---------
 drivers/infiniband/sw/rxe/rxe_task.h |  2 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 097ddb16c230..42442ede99e8 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -35,22 +35,22 @@ static void do_task(struct tasklet_struct *t)
 	struct rxe_task *task = from_tasklet(task, t, tasklet);
 	unsigned int iterations = RXE_MAX_ITERATIONS;
 
-	spin_lock_bh(&task->state_lock);
+	spin_lock_bh(&task->lock);
 	switch (task->state) {
 	case TASK_STATE_START:
 		task->state = TASK_STATE_BUSY;
-		spin_unlock_bh(&task->state_lock);
+		spin_unlock_bh(&task->lock);
 		break;
 
 	case TASK_STATE_BUSY:
 		task->state = TASK_STATE_ARMED;
 		fallthrough;
 	case TASK_STATE_ARMED:
-		spin_unlock_bh(&task->state_lock);
+		spin_unlock_bh(&task->lock);
 		return;
 
 	default:
-		spin_unlock_bh(&task->state_lock);
+		spin_unlock_bh(&task->lock);
 		pr_warn("%s failed with bad state %d\n", __func__, task->state);
 		return;
 	}
@@ -59,7 +59,7 @@ static void do_task(struct tasklet_struct *t)
 		cont = 0;
 		ret = task->func(task->arg);
 
-		spin_lock_bh(&task->state_lock);
+		spin_lock_bh(&task->lock);
 		switch (task->state) {
 		case TASK_STATE_BUSY:
 			if (ret) {
@@ -88,7 +88,7 @@ static void do_task(struct tasklet_struct *t)
 			pr_warn("%s failed with bad state %d\n", __func__,
 				task->state);
 		}
-		spin_unlock_bh(&task->state_lock);
+		spin_unlock_bh(&task->lock);
 	} while (cont);
 
 	task->ret = ret;
@@ -103,7 +103,7 @@ int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *))
 	tasklet_setup(&task->tasklet, do_task);
 
 	task->state = TASK_STATE_START;
-	spin_lock_init(&task->state_lock);
+	spin_lock_init(&task->lock);
 
 	return 0;
 }
@@ -119,9 +119,9 @@ void rxe_cleanup_task(struct rxe_task *task)
 	task->destroyed = true;
 
 	do {
-		spin_lock_bh(&task->state_lock);
+		spin_lock_bh(&task->lock);
 		idle = (task->state == TASK_STATE_START);
-		spin_unlock_bh(&task->state_lock);
+		spin_unlock_bh(&task->lock);
 	} while (!idle);
 
 	tasklet_kill(&task->tasklet);
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index 99e0173e5c46..7b88129702ac 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -21,7 +21,7 @@ enum {
 struct rxe_task {
 	struct tasklet_struct	tasklet;
 	int			state;
-	spinlock_t		state_lock; /* spinlock for task state */
+	spinlock_t		lock;
 	void			*arg;
 	int			(*func)(void *arg);
 	int			ret;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH for-next 06/16] RDMA/rxe: Make task interface pluggable
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
                   ` (4 preceding siblings ...)
  2022-10-18  4:33 ` [PATCH for-next 05/16] RDMA/rxe: Rename task->state_lock to task->lock Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 07/16] RDMA/rxe: Simplify reset state handling in rxe_resp.c Bob Pearson
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

Make the internal interface to the task operations pluggable and
add a new 'inline' type.

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c   |   8 +-
 drivers/infiniband/sw/rxe/rxe_task.c | 160 ++++++++++++++++++++++-----
 drivers/infiniband/sw/rxe/rxe_task.h |  44 +++++---
 3 files changed, 165 insertions(+), 47 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 3f6d62a80bea..b5e108794aa1 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -238,8 +238,10 @@ static int rxe_qp_init_req(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	skb_queue_head_init(&qp->req_pkts);
 
-	rxe_init_task(&qp->req.task, qp, rxe_requester);
-	rxe_init_task(&qp->comp.task, qp, rxe_completer);
+	rxe_init_task(&qp->req.task, qp, rxe_requester, RXE_TASK_TYPE_TASKLET);
+	rxe_init_task(&qp->comp.task, qp, rxe_completer,
+			(qp_type(qp) == IB_QPT_RC) ? RXE_TASK_TYPE_TASKLET :
+						     RXE_TASK_TYPE_INLINE);
 
 	qp->qp_timeout_jiffies = 0; /* Can't be set for UD/UC in modify_qp */
 	if (init->qp_type == IB_QPT_RC) {
@@ -286,7 +288,7 @@ static int rxe_qp_init_resp(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	skb_queue_head_init(&qp->resp_pkts);
 
-	rxe_init_task(&qp->resp.task, qp, rxe_responder);
+	rxe_init_task(&qp->resp.task, qp, rxe_responder, RXE_TASK_TYPE_TASKLET);
 
 	qp->resp.opcode		= OPCODE_NONE;
 	qp->resp.msn		= 0;
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 42442ede99e8..fcd87114949f 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -28,12 +28,11 @@ int __rxe_do_task(struct rxe_task *task)
  * a second caller finds the task already running
  * but looks just after the last call to func
  */
-static void do_task(struct tasklet_struct *t)
+static void do_task(struct rxe_task *task)
 {
+	unsigned int iterations = RXE_MAX_ITERATIONS;
 	int cont;
 	int ret;
-	struct rxe_task *task = from_tasklet(task, t, tasklet);
-	unsigned int iterations = RXE_MAX_ITERATIONS;
 
 	spin_lock_bh(&task->lock);
 	switch (task->state) {
@@ -94,28 +93,21 @@ static void do_task(struct tasklet_struct *t)
 	task->ret = ret;
 }
 
-int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *))
+static void disable_task(struct rxe_task *task)
 {
-	task->arg	= arg;
-	task->func	= func;
-	task->destroyed	= false;
-
-	tasklet_setup(&task->tasklet, do_task);
-
-	task->state = TASK_STATE_START;
-	spin_lock_init(&task->lock);
+	/* todo */
+}
 
-	return 0;
+static void enable_task(struct rxe_task *task)
+{
+	/* todo */
 }
 
-void rxe_cleanup_task(struct rxe_task *task)
+/* busy wait until any previous tasks are done */
+static void cleanup_task(struct rxe_task *task)
 {
 	bool idle;
 
-	/*
-	 * Mark the task, then wait for it to finish. It might be
-	 * running in a non-tasklet (direct call) context.
-	 */
 	task->destroyed = true;
 
 	do {
@@ -123,32 +115,144 @@ void rxe_cleanup_task(struct rxe_task *task)
 		idle = (task->state == TASK_STATE_START);
 		spin_unlock_bh(&task->lock);
 	} while (!idle);
+}
 
-	tasklet_kill(&task->tasklet);
+/* silently treat schedule as inline for inline tasks */
+static void inline_sched(struct rxe_task *task)
+{
+	do_task(task);
 }
 
-void rxe_run_task(struct rxe_task *task)
+static void inline_run(struct rxe_task *task)
 {
-	if (task->destroyed)
-		return;
+	do_task(task);
+}
 
-	do_task(&task->tasklet);
+static void inline_disable(struct rxe_task *task)
+{
+	disable_task(task);
 }
 
-void rxe_sched_task(struct rxe_task *task)
+static void inline_enable(struct rxe_task *task)
 {
-	if (task->destroyed)
-		return;
+	enable_task(task);
+}
+
+static void inline_cleanup(struct rxe_task *task)
+{
+	cleanup_task(task);
+}
+
+static const struct rxe_task_ops inline_ops = {
+	.sched = inline_sched,
+	.run = inline_run,
+	.enable = inline_enable,
+	.disable = inline_disable,
+	.cleanup = inline_cleanup,
+};
 
+static void inline_init(struct rxe_task *task)
+{
+	task->ops = &inline_ops;
+}
+
+/* use tsklet_xxx to avoid name collisions with tasklet_xxx */
+static void tsklet_sched(struct rxe_task *task)
+{
 	tasklet_schedule(&task->tasklet);
 }
 
-void rxe_disable_task(struct rxe_task *task)
+static void tsklet_do_task(struct tasklet_struct *tasklet)
 {
+	struct rxe_task *task = container_of(tasklet, typeof(*task), tasklet);
+
+	do_task(task);
+}
+
+static void tsklet_run(struct rxe_task *task)
+{
+	do_task(task);
+}
+
+static void tsklet_disable(struct rxe_task *task)
+{
+	disable_task(task);
 	tasklet_disable(&task->tasklet);
 }
 
-void rxe_enable_task(struct rxe_task *task)
+static void tsklet_enable(struct rxe_task *task)
 {
 	tasklet_enable(&task->tasklet);
+	enable_task(task);
+}
+
+static void tsklet_cleanup(struct rxe_task *task)
+{
+	cleanup_task(task);
+	tasklet_kill(&task->tasklet);
+}
+
+static const struct rxe_task_ops tsklet_ops = {
+	.sched = tsklet_sched,
+	.run = tsklet_run,
+	.enable = tsklet_enable,
+	.disable = tsklet_disable,
+	.cleanup = tsklet_cleanup,
+};
+
+static void tsklet_init(struct rxe_task *task)
+{
+	tasklet_setup(&task->tasklet, tsklet_do_task);
+	task->ops = &tsklet_ops;
+}
+
+int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *),
+		  enum rxe_task_type type)
+{
+	task->arg	= arg;
+	task->func	= func;
+	task->destroyed	= false;
+	task->type	= type;
+	task->state	= TASK_STATE_START;
+
+	spin_lock_init(&task->lock);
+
+	switch (type) {
+	case RXE_TASK_TYPE_INLINE:
+		inline_init(task);
+		break;
+	case RXE_TASK_TYPE_TASKLET:
+		tsklet_init(task);
+		break;
+	default:
+		pr_debug("%s: invalid task type = %d\n", __func__, type);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+void rxe_sched_task(struct rxe_task *task)
+{
+	task->ops->sched(task);
+}
+
+void rxe_run_task(struct rxe_task *task)
+{
+	task->ops->run(task);
+}
+
+void rxe_enable_task(struct rxe_task *task)
+{
+	task->ops->enable(task);
+}
+
+void rxe_disable_task(struct rxe_task *task)
+{
+	task->ops->disable(task);
+}
+
+void rxe_cleanup_task(struct rxe_task *task)
+{
+	task->ops->cleanup(task);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index 7b88129702ac..31963129ff7a 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -7,6 +7,21 @@
 #ifndef RXE_TASK_H
 #define RXE_TASK_H
 
+struct rxe_task;
+
+struct rxe_task_ops {
+	void (*sched)(struct rxe_task *task);
+	void (*run)(struct rxe_task *task);
+	void (*disable)(struct rxe_task *task);
+	void (*enable)(struct rxe_task *task);
+	void (*cleanup)(struct rxe_task *task);
+};
+
+enum rxe_task_type {
+	RXE_TASK_TYPE_INLINE	= 0,
+	RXE_TASK_TYPE_TASKLET	= 1,
+};
+
 enum {
 	TASK_STATE_START	= 0,
 	TASK_STATE_BUSY		= 1,
@@ -19,24 +34,19 @@ enum {
  * called again.
  */
 struct rxe_task {
-	struct tasklet_struct	tasklet;
-	int			state;
-	spinlock_t		lock;
-	void			*arg;
-	int			(*func)(void *arg);
-	int			ret;
-	bool			destroyed;
+	struct tasklet_struct		tasklet;
+	int				state;
+	spinlock_t			lock;
+	void				*arg;
+	int				(*func)(void *arg);
+	int				ret;
+	bool				destroyed;
+	const struct rxe_task_ops	*ops;
+	enum rxe_task_type		type;
 };
 
-/*
- * init rxe_task structure
- *	arg  => parameter to pass to fcn
- *	func => function to call until it returns != 0
- */
-int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *));
-
-/* cleanup task */
-void rxe_cleanup_task(struct rxe_task *task);
+int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *),
+		  enum rxe_task_type type);
 
 /*
  * raw call to func in loop without any checking
@@ -54,4 +64,6 @@ void rxe_disable_task(struct rxe_task *task);
 /* allow task to run */
 void rxe_enable_task(struct rxe_task *task);
 
+void rxe_cleanup_task(struct rxe_task *task);
+
 #endif /* RXE_TASK_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH for-next 07/16] RDMA/rxe: Simplify reset state handling in rxe_resp.c
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
                   ` (5 preceding siblings ...)
  2022-10-18  4:33 ` [PATCH for-next 06/16] RDMA/rxe: Make task interface pluggable Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 08/16] RDMA/rxe: Split rxe_drain_resp_pkts() Bob Pearson
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

Make rxe_responder() more like rxe_completer() and take qp reset
handling out of the state machine.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_resp.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 2a2a50de51b2..dd11dea70bbf 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -40,7 +40,6 @@ enum resp_states {
 	RESPST_ERR_LENGTH,
 	RESPST_ERR_CQ_OVERFLOW,
 	RESPST_ERROR,
-	RESPST_RESET,
 	RESPST_DONE,
 	RESPST_EXIT,
 };
@@ -75,7 +74,6 @@ static char *resp_state_name[] = {
 	[RESPST_ERR_LENGTH]			= "ERR_LENGTH",
 	[RESPST_ERR_CQ_OVERFLOW]		= "ERR_CQ_OVERFLOW",
 	[RESPST_ERROR]				= "ERROR",
-	[RESPST_RESET]				= "RESET",
 	[RESPST_DONE]				= "DONE",
 	[RESPST_EXIT]				= "EXIT",
 };
@@ -1278,8 +1276,9 @@ int rxe_responder(void *arg)
 
 	switch (qp->resp.state) {
 	case QP_STATE_RESET:
-		state = RESPST_RESET;
-		break;
+		rxe_drain_req_pkts(qp, false);
+		qp->resp.wqe = NULL;
+		goto exit;
 
 	default:
 		state = RESPST_GET_REQ;
@@ -1438,11 +1437,6 @@ int rxe_responder(void *arg)
 
 			goto exit;
 
-		case RESPST_RESET:
-			rxe_drain_req_pkts(qp, false);
-			qp->resp.wqe = NULL;
-			goto exit;
-
 		case RESPST_ERROR:
 			qp->resp.goto_error = 0;
 			pr_debug("qp#%d moved to error state\n", qp_num(qp));
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH for-next 08/16] RDMA/rxe: Split rxe_drain_resp_pkts()
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
                   ` (6 preceding siblings ...)
  2022-10-18  4:33 ` [PATCH for-next 07/16] RDMA/rxe: Simplify reset state handling in rxe_resp.c Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 09/16] RDMA/rxe: Handle qp error in rxe_resp.c Bob Pearson
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

Split rxe_drain_resp_pkts() into two subroutines which perform separate
functions.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index d2a250123617..5d434cce2b69 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -524,17 +524,21 @@ static inline enum comp_state complete_wqe(struct rxe_qp *qp,
 	return COMPST_GET_WQE;
 }
 
-static void rxe_drain_resp_pkts(struct rxe_qp *qp, bool notify)
+static void rxe_drain_resp_pkts(struct rxe_qp *qp)
 {
 	struct sk_buff *skb;
-	struct rxe_send_wqe *wqe;
-	struct rxe_queue *q = qp->sq.queue;
 
 	while ((skb = skb_dequeue(&qp->resp_pkts))) {
 		rxe_put(qp);
 		kfree_skb(skb);
 		ib_device_put(qp->ibqp.device);
 	}
+}
+
+static void rxe_drain_send_queue(struct rxe_qp *qp, bool notify)
+{
+	struct rxe_send_wqe *wqe;
+	struct rxe_queue *q = qp->sq.queue;
 
 	while ((wqe = queue_head(q, q->type))) {
 		if (notify) {
@@ -565,6 +569,7 @@ int rxe_completer(void *arg)
 	struct sk_buff *skb = NULL;
 	struct rxe_pkt_info *pkt = NULL;
 	enum comp_state state;
+	bool notify;
 	int ret;
 
 	if (!rxe_get(qp))
@@ -572,8 +577,9 @@ int rxe_completer(void *arg)
 
 	if (!qp->valid || qp->comp.state == QP_STATE_ERROR ||
 	    qp->comp.state == QP_STATE_RESET) {
-		rxe_drain_resp_pkts(qp, qp->valid &&
-				    qp->comp.state == QP_STATE_ERROR);
+		notify = qp->valid && (qp->comp.state == QP_STATE_ERROR);
+		rxe_drain_resp_pkts(qp);
+		rxe_drain_send_queue(qp, notify);
 		goto exit;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH for-next 09/16] RDMA/rxe: Handle qp error in rxe_resp.c
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
                   ` (7 preceding siblings ...)
  2022-10-18  4:33 ` [PATCH for-next 08/16] RDMA/rxe: Split rxe_drain_resp_pkts() Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 10/16] RDMA/rxe: Cleanup comp tasks in rxe_qp.c Bob Pearson
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

Split rxe_drain_req_pkts() into two subroutines which perform
separate tasks. Change qp error and reset states and !qp->valid
in the same way as rxe_comp.c. FLush recv wqes for qp in error.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_resp.c | 64 ++++++++++++++++++++--------
 1 file changed, 47 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index dd11dea70bbf..0bcdd1154641 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -1025,7 +1025,6 @@ static enum resp_states do_complete(struct rxe_qp *qp,
 		return RESPST_CLEANUP;
 }
 
-
 static int send_common_ack(struct rxe_qp *qp, u8 syndrome, u32 psn,
 				  int opcode, const char *msg)
 {
@@ -1240,22 +1239,56 @@ static enum resp_states do_class_d1e_error(struct rxe_qp *qp)
 	}
 }
 
-static void rxe_drain_req_pkts(struct rxe_qp *qp, bool notify)
+static void rxe_drain_req_pkts(struct rxe_qp *qp)
 {
 	struct sk_buff *skb;
-	struct rxe_queue *q = qp->rq.queue;
 
 	while ((skb = skb_dequeue(&qp->req_pkts))) {
 		rxe_put(qp);
 		kfree_skb(skb);
 		ib_device_put(qp->ibqp.device);
 	}
+}
+
+int complete_flush(struct rxe_qp *qp, struct rxe_recv_wqe *wqe)
+{
+	struct rxe_cqe cqe;
+	struct ib_wc *wc = &cqe.ibwc;
+	struct ib_uverbs_wc *uwc = &cqe.uibwc;
+
+	memset(&cqe, 0, sizeof(cqe));
 
-	if (notify)
-		return;
+	if (qp->rcq->is_user) {
+		uwc->status		= IB_WC_WR_FLUSH_ERR;
+		uwc->qp_num		= qp->ibqp.qp_num;
+		uwc->wr_id		= wqe->wr_id;
+	} else {
+		wc->status		= IB_WC_WR_FLUSH_ERR;
+		wc->qp			= &qp->ibqp;
+		wc->wr_id		= wqe->wr_id;
+	}
 
-	while (!qp->srq && q && queue_head(q, q->type))
+	if (rxe_cq_post(qp->rcq, &cqe, 0))
+		return -ENOMEM;
+
+	return 0;
+}
+
+/* drain the receive queue. Complete each wqe with a flush error
+ * if notify is true or until a cq overflow occurs.
+ */
+static void rxe_drain_recv_queue(struct rxe_qp *qp, bool notify)
+{
+	struct rxe_recv_wqe *wqe;
+	struct rxe_queue *q = qp->rq.queue;
+
+	while ((wqe = queue_head(q, q->type))) {
+		if (notify && complete_flush(qp, wqe))
+			notify = 0;
 		queue_advance_consumer(q, q->type);
+	}
+
+	qp->resp.wqe = NULL;
 }
 
 int rxe_responder(void *arg)
@@ -1264,6 +1297,7 @@ int rxe_responder(void *arg)
 	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
 	enum resp_states state;
 	struct rxe_pkt_info *pkt = NULL;
+	bool notify;
 	int ret;
 
 	if (!rxe_get(qp))
@@ -1271,20 +1305,16 @@ int rxe_responder(void *arg)
 
 	qp->resp.aeth_syndrome = AETH_ACK_UNLIMITED;
 
-	if (!qp->valid)
-		goto exit;
-
-	switch (qp->resp.state) {
-	case QP_STATE_RESET:
-		rxe_drain_req_pkts(qp, false);
-		qp->resp.wqe = NULL;
+	if (!qp->valid || qp->resp.state == QP_STATE_ERROR ||
+	    qp->resp.state == QP_STATE_RESET) {
+		notify = qp->valid && (qp->resp.state == QP_STATE_ERROR);
+		rxe_drain_req_pkts(qp);
+		rxe_drain_recv_queue(qp, notify);
 		goto exit;
-
-	default:
-		state = RESPST_GET_REQ;
-		break;
 	}
 
+	state = RESPST_GET_REQ;
+
 	while (1) {
 		pr_debug("qp#%d state = %s\n", qp_num(qp),
 			 resp_state_name[state]);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH for-next 10/16] RDMA/rxe: Cleanup comp tasks in rxe_qp.c
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
                   ` (8 preceding siblings ...)
  2022-10-18  4:33 ` [PATCH for-next 09/16] RDMA/rxe: Handle qp error in rxe_resp.c Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 11/16] RDMA/rxe: Remove __rxe_do_task() Bob Pearson
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

Take advantage of inline task behavior to cleanup code in rxe_qp.c
for completer tasks.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index b5e108794aa1..3691eb97c576 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -480,8 +480,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
 
 	/* stop request/comp */
 	if (qp->sq.queue) {
-		if (qp_type(qp) == IB_QPT_RC)
-			rxe_disable_task(&qp->comp.task);
+		rxe_disable_task(&qp->comp.task);
 		rxe_disable_task(&qp->req.task);
 	}
 
@@ -524,9 +523,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
 	rxe_enable_task(&qp->resp.task);
 
 	if (qp->sq.queue) {
-		if (qp_type(qp) == IB_QPT_RC)
-			rxe_enable_task(&qp->comp.task);
-
+		rxe_enable_task(&qp->comp.task);
 		rxe_enable_task(&qp->req.task);
 	}
 }
@@ -537,10 +534,7 @@ static void rxe_qp_drain(struct rxe_qp *qp)
 	if (qp->sq.queue) {
 		if (qp->req.state != QP_STATE_DRAINED) {
 			qp->req.state = QP_STATE_DRAIN;
-			if (qp_type(qp) == IB_QPT_RC)
-				rxe_sched_task(&qp->comp.task);
-			else
-				__rxe_do_task(&qp->comp.task);
+			rxe_sched_task(&qp->comp.task);
 			rxe_sched_task(&qp->req.task);
 		}
 	}
@@ -556,11 +550,7 @@ void rxe_qp_error(struct rxe_qp *qp)
 
 	/* drain work and packet queues */
 	rxe_sched_task(&qp->resp.task);
-
-	if (qp_type(qp) == IB_QPT_RC)
-		rxe_sched_task(&qp->comp.task);
-	else
-		__rxe_do_task(&qp->comp.task);
+	rxe_sched_task(&qp->comp.task);
 	rxe_sched_task(&qp->req.task);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH for-next 11/16] RDMA/rxe: Remove __rxe_do_task()
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
                   ` (9 preceding siblings ...)
  2022-10-18  4:33 ` [PATCH for-next 10/16] RDMA/rxe: Cleanup comp tasks in rxe_qp.c Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 12/16] RDMA/rxe: Make tasks schedule each other Bob Pearson
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

The subroutine __rxe_do_task is not thread safe. It is only
used in the rxe_qp_reset() and rxe_qp_do_cleanup() routines.
After changes in error handling in the tasklet functions the
queues can be drained by calling them once outside of the
tasklet code. This allows __rxe_do_task() to be removed.

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c   | 60 ++++++++++++++--------------
 drivers/infiniband/sw/rxe/rxe_task.c | 13 ------
 drivers/infiniband/sw/rxe/rxe_task.h |  6 ---
 3 files changed, 29 insertions(+), 50 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 3691eb97c576..50f6b8b8ad9d 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -477,28 +477,23 @@ static void rxe_qp_reset(struct rxe_qp *qp)
 {
 	/* stop tasks from running */
 	rxe_disable_task(&qp->resp.task);
-
-	/* stop request/comp */
-	if (qp->sq.queue) {
-		rxe_disable_task(&qp->comp.task);
-		rxe_disable_task(&qp->req.task);
-	}
+	rxe_disable_task(&qp->comp.task);
+	rxe_disable_task(&qp->req.task);
 
 	/* move qp to the reset state */
 	qp->req.state = QP_STATE_RESET;
 	qp->comp.state = QP_STATE_RESET;
 	qp->resp.state = QP_STATE_RESET;
 
-	/* let state machines reset themselves drain work and packet queues
-	 * etc.
-	 */
-	__rxe_do_task(&qp->resp.task);
+	/* drain work and packet queues */
+	rxe_responder(qp);
+	rxe_completer(qp);
+	rxe_requester(qp);
 
-	if (qp->sq.queue) {
-		__rxe_do_task(&qp->comp.task);
-		__rxe_do_task(&qp->req.task);
+	if (qp->rq.queue)
+		rxe_queue_reset(qp->rq.queue);
+	if (qp->sq.queue)
 		rxe_queue_reset(qp->sq.queue);
-	}
 
 	/* cleanup attributes */
 	atomic_set(&qp->ssn, 0);
@@ -521,11 +516,8 @@ static void rxe_qp_reset(struct rxe_qp *qp)
 
 	/* reenable tasks */
 	rxe_enable_task(&qp->resp.task);
-
-	if (qp->sq.queue) {
-		rxe_enable_task(&qp->comp.task);
-		rxe_enable_task(&qp->req.task);
-	}
+	rxe_enable_task(&qp->comp.task);
+	rxe_enable_task(&qp->req.task);
 }
 
 /* drain the send queue */
@@ -543,15 +535,25 @@ static void rxe_qp_drain(struct rxe_qp *qp)
 /* move the qp to the error state */
 void rxe_qp_error(struct rxe_qp *qp)
 {
+	/* stop tasks from running */
+	rxe_disable_task(&qp->resp.task);
+	rxe_disable_task(&qp->comp.task);
+	rxe_disable_task(&qp->req.task);
+
 	qp->req.state = QP_STATE_ERROR;
 	qp->resp.state = QP_STATE_ERROR;
 	qp->comp.state = QP_STATE_ERROR;
 	qp->attr.qp_state = IB_QPS_ERR;
 
 	/* drain work and packet queues */
-	rxe_sched_task(&qp->resp.task);
-	rxe_sched_task(&qp->comp.task);
-	rxe_sched_task(&qp->req.task);
+	rxe_responder(qp);
+	rxe_completer(qp);
+	rxe_requester(qp);
+
+	/* reenable tasks */
+	rxe_enable_task(&qp->resp.task);
+	rxe_enable_task(&qp->comp.task);
+	rxe_enable_task(&qp->req.task);
 }
 
 /* called by the modify qp verb */
@@ -770,24 +772,20 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
 
 	qp->valid = 0;
 	qp->qp_timeout_jiffies = 0;
-	rxe_cleanup_task(&qp->resp.task);
 
 	if (qp_type(qp) == IB_QPT_RC) {
 		del_timer_sync(&qp->retrans_timer);
 		del_timer_sync(&qp->rnr_nak_timer);
 	}
 
+	rxe_cleanup_task(&qp->resp.task);
 	rxe_cleanup_task(&qp->req.task);
 	rxe_cleanup_task(&qp->comp.task);
 
-	/* flush out any receive wr's or pending requests */
-	if (qp->req.task.func)
-		__rxe_do_task(&qp->req.task);
-
-	if (qp->sq.queue) {
-		__rxe_do_task(&qp->comp.task);
-		__rxe_do_task(&qp->req.task);
-	}
+	/* drain any receive wr's or pending requests */
+	rxe_responder(qp);
+	rxe_completer(qp);
+	rxe_requester(qp);
 
 	if (qp->sq.queue)
 		rxe_queue_cleanup(qp->sq.queue);
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index fcd87114949f..75ec195f4176 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -10,19 +10,6 @@
 
 #include "rxe.h"
 
-int __rxe_do_task(struct rxe_task *task)
-
-{
-	int ret;
-
-	while ((ret = task->func(task->arg)) == 0)
-		;
-
-	task->ret = ret;
-
-	return ret;
-}
-
 /*
  * this locking is due to a potential race where
  * a second caller finds the task already running
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index 31963129ff7a..d594468fcf56 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -48,12 +48,6 @@ struct rxe_task {
 int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *),
 		  enum rxe_task_type type);
 
-/*
- * raw call to func in loop without any checking
- * can call when tasklets are disabled
- */
-int __rxe_do_task(struct rxe_task *task);
-
 void rxe_run_task(struct rxe_task *task);
 
 void rxe_sched_task(struct rxe_task *task);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH for-next 12/16] RDMA/rxe: Make tasks schedule each other
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
                   ` (10 preceding siblings ...)
  2022-10-18  4:33 ` [PATCH for-next 11/16] RDMA/rxe: Remove __rxe_do_task() Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 13/16] RDMA/rxe: Implement disable/enable_task() Bob Pearson
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

Replace rxe_run_task() by rxe_sched_task() when tasks call each other.
These are not performance critical and mainly involve error paths but
they run the risk of causing deadlocks.

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c | 8 ++++----
 drivers/infiniband/sw/rxe/rxe_req.c  | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 5d434cce2b69..6c15c9307660 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -308,7 +308,7 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
 					qp->comp.psn = pkt->psn;
 					if (qp->req.wait_psn) {
 						qp->req.wait_psn = 0;
-						rxe_run_task(&qp->req.task);
+						rxe_sched_task(&qp->req.task);
 					}
 				}
 				return COMPST_ERROR_RETRY;
@@ -455,7 +455,7 @@ static void do_complete(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 	 */
 	if (qp->req.wait_fence) {
 		qp->req.wait_fence = 0;
-		rxe_run_task(&qp->req.task);
+		rxe_sched_task(&qp->req.task);
 	}
 }
 
@@ -469,7 +469,7 @@ static inline enum comp_state complete_ack(struct rxe_qp *qp,
 		if (qp->req.need_rd_atomic) {
 			qp->comp.timeout_retry = 0;
 			qp->req.need_rd_atomic = 0;
-			rxe_run_task(&qp->req.task);
+			rxe_sched_task(&qp->req.task);
 		}
 	}
 
@@ -723,7 +723,7 @@ int rxe_completer(void *arg)
 							RXE_CNT_COMP_RETRY);
 					qp->req.need_retry = 1;
 					qp->comp.started_retry = 1;
-					rxe_run_task(&qp->req.task);
+					rxe_sched_task(&qp->req.task);
 				}
 				goto done;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 41f1d84f0acb..fba7572e1d0c 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -733,7 +733,7 @@ int rxe_requester(void *arg)
 						       qp->req.wqe_index);
 			wqe->state = wqe_state_done;
 			wqe->status = IB_WC_SUCCESS;
-			rxe_run_task(&qp->comp.task);
+			rxe_sched_task(&qp->comp.task);
 			goto done;
 		}
 		payload = mtu;
@@ -817,7 +817,7 @@ int rxe_requester(void *arg)
 	qp->req.wqe_index = queue_next_index(qp->sq.queue, qp->req.wqe_index);
 	wqe->state = wqe_state_error;
 	qp->req.state = QP_STATE_ERROR;
-	rxe_run_task(&qp->comp.task);
+	rxe_sched_task(&qp->comp.task);
 exit:
 	ret = -EAGAIN;
 out:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH for-next 13/16] RDMA/rxe: Implement disable/enable_task()
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
                   ` (11 preceding siblings ...)
  2022-10-18  4:33 ` [PATCH for-next 12/16] RDMA/rxe: Make tasks schedule each other Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 14/16] RDMA/rxe: Replace TASK_STATE_START by TASK_STATE_IDLE Bob Pearson
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

Implement common disable_task() and enable_task() routines by
adding a new PAUSED state to the do_task() state machine.
These replace tasklet_disable and tasklet_enable with code that
can be shared with all the task types. Move rxe_sched_task to
re-schedule the task outside of the locks to avoid a deadlock.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_task.c | 107 ++++++++++++++++-----------
 drivers/infiniband/sw/rxe/rxe_task.h |   1 +
 2 files changed, 66 insertions(+), 42 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 75ec195f4176..a9eb66d69cb7 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -10,36 +10,46 @@
 
 #include "rxe.h"
 
-/*
- * this locking is due to a potential race where
- * a second caller finds the task already running
- * but looks just after the last call to func
- */
-static void do_task(struct rxe_task *task)
+static bool task_is_idle(struct rxe_task *task)
 {
-	unsigned int iterations = RXE_MAX_ITERATIONS;
-	int cont;
-	int ret;
+	if (task->destroyed)
+		return false;
 
 	spin_lock_bh(&task->lock);
 	switch (task->state) {
 	case TASK_STATE_START:
 		task->state = TASK_STATE_BUSY;
 		spin_unlock_bh(&task->lock);
-		break;
-
+		return true;
 	case TASK_STATE_BUSY:
 		task->state = TASK_STATE_ARMED;
 		fallthrough;
 	case TASK_STATE_ARMED:
-		spin_unlock_bh(&task->lock);
-		return;
-
+	case TASK_STATE_PAUSED:
+		break;
 	default:
+		WARN_ON(1);
+		break;
+	}
+	spin_unlock_bh(&task->lock);
+
+	return false;
+}
+
+static void do_task(struct rxe_task *task)
+{
+	unsigned int iterations = RXE_MAX_ITERATIONS;
+	bool resched = false;
+	int cont;
+	int ret;
+
+	/* flush out pending tasks */
+	spin_lock_bh(&task->lock);
+	if (task->state == TASK_STATE_PAUSED) {
 		spin_unlock_bh(&task->lock);
-		pr_warn("%s failed with bad state %d\n", __func__, task->state);
 		return;
 	}
+	spin_unlock_bh(&task->lock);
 
 	do {
 		cont = 0;
@@ -47,47 +57,52 @@ static void do_task(struct rxe_task *task)
 
 		spin_lock_bh(&task->lock);
 		switch (task->state) {
+		case TASK_STATE_START:
 		case TASK_STATE_BUSY:
 			if (ret) {
 				task->state = TASK_STATE_START;
-			} else if (iterations--) {
+			} else if (task->type == RXE_TASK_TYPE_INLINE ||
+					iterations--) {
 				cont = 1;
 			} else {
-				/* reschedule the tasklet and exit
-				 * the loop to give up the cpu
-				 */
-				tasklet_schedule(&task->tasklet);
 				task->state = TASK_STATE_START;
+				resched = true;
 			}
 			break;
-
-		/* someone tried to run the task since the last time we called
-		 * func, so we will call one more time regardless of the
-		 * return value
-		 */
 		case TASK_STATE_ARMED:
 			task->state = TASK_STATE_BUSY;
 			cont = 1;
 			break;
-
+		case TASK_STATE_PAUSED:
+			break;
 		default:
-			pr_warn("%s failed with bad state %d\n", __func__,
-				task->state);
+			WARN_ON(1);
+			break;
 		}
 		spin_unlock_bh(&task->lock);
 	} while (cont);
 
+	if (resched)
+		rxe_sched_task(task);
+
 	task->ret = ret;
 }
 
 static void disable_task(struct rxe_task *task)
 {
-	/* todo */
+	spin_lock_bh(&task->lock);
+	task->state = TASK_STATE_PAUSED;
+	spin_unlock_bh(&task->lock);
 }
 
 static void enable_task(struct rxe_task *task)
 {
-	/* todo */
+	spin_lock_bh(&task->lock);
+	task->state = TASK_STATE_START;
+	spin_unlock_bh(&task->lock);
+
+	/* restart task in case */
+	rxe_run_task(task);
 }
 
 /* busy wait until any previous tasks are done */
@@ -99,7 +114,8 @@ static void cleanup_task(struct rxe_task *task)
 
 	do {
 		spin_lock_bh(&task->lock);
-		idle = (task->state == TASK_STATE_START);
+		idle = (task->state == TASK_STATE_START ||
+			task->state == TASK_STATE_PAUSED);
 		spin_unlock_bh(&task->lock);
 	} while (!idle);
 }
@@ -107,22 +123,26 @@ static void cleanup_task(struct rxe_task *task)
 /* silently treat schedule as inline for inline tasks */
 static void inline_sched(struct rxe_task *task)
 {
-	do_task(task);
+	if (task_is_idle(task))
+		do_task(task);
 }
 
 static void inline_run(struct rxe_task *task)
 {
-	do_task(task);
+	if (task_is_idle(task))
+		do_task(task);
 }
 
 static void inline_disable(struct rxe_task *task)
 {
-	disable_task(task);
+	if (!task->destroyed)
+		disable_task(task);
 }
 
 static void inline_enable(struct rxe_task *task)
 {
-	enable_task(task);
+	if (!task->destroyed)
+		enable_task(task);
 }
 
 static void inline_cleanup(struct rxe_task *task)
@@ -146,31 +166,34 @@ static void inline_init(struct rxe_task *task)
 /* use tsklet_xxx to avoid name collisions with tasklet_xxx */
 static void tsklet_sched(struct rxe_task *task)
 {
-	tasklet_schedule(&task->tasklet);
+	if (task_is_idle(task))
+		tasklet_schedule(&task->tasklet);
 }
 
 static void tsklet_do_task(struct tasklet_struct *tasklet)
 {
 	struct rxe_task *task = container_of(tasklet, typeof(*task), tasklet);
 
-	do_task(task);
+	if (!task->destroyed)
+		do_task(task);
 }
 
 static void tsklet_run(struct rxe_task *task)
 {
-	do_task(task);
+	if (task_is_idle(task))
+		do_task(task);
 }
 
 static void tsklet_disable(struct rxe_task *task)
 {
-	disable_task(task);
-	tasklet_disable(&task->tasklet);
+	if (!task->destroyed)
+		disable_task(task);
 }
 
 static void tsklet_enable(struct rxe_task *task)
 {
-	tasklet_enable(&task->tasklet);
-	enable_task(task);
+	if (!task->destroyed)
+		enable_task(task);
 }
 
 static void tsklet_cleanup(struct rxe_task *task)
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index d594468fcf56..792832786456 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -26,6 +26,7 @@ enum {
 	TASK_STATE_START	= 0,
 	TASK_STATE_BUSY		= 1,
 	TASK_STATE_ARMED	= 2,
+	TASK_STATE_PAUSED	= 3,
 };
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH for-next 14/16] RDMA/rxe: Replace TASK_STATE_START by TASK_STATE_IDLE
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
                   ` (12 preceding siblings ...)
  2022-10-18  4:33 ` [PATCH for-next 13/16] RDMA/rxe: Implement disable/enable_task() Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-18  4:33 ` [PATCH for-next 15/16] RDMA/rxe: Add workqueue support for tasks Bob Pearson
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

Replace the enum TASK_STATE_START by TASK_STATE_IDLE.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_task.c | 37 ++++++++++++++--------------
 drivers/infiniband/sw/rxe/rxe_task.h |  4 +--
 2 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index a9eb66d69cb7..a2c58ce66c8f 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -12,12 +12,12 @@
 
 static bool task_is_idle(struct rxe_task *task)
 {
-	if (task->destroyed)
+	if (!task->valid)
 		return false;
 
 	spin_lock_bh(&task->lock);
 	switch (task->state) {
-	case TASK_STATE_START:
+	case TASK_STATE_IDLE:
 		task->state = TASK_STATE_BUSY;
 		spin_unlock_bh(&task->lock);
 		return true;
@@ -57,15 +57,15 @@ static void do_task(struct rxe_task *task)
 
 		spin_lock_bh(&task->lock);
 		switch (task->state) {
-		case TASK_STATE_START:
+		case TASK_STATE_IDLE:
 		case TASK_STATE_BUSY:
 			if (ret) {
-				task->state = TASK_STATE_START;
+				task->state = TASK_STATE_IDLE;
 			} else if (task->type == RXE_TASK_TYPE_INLINE ||
 					iterations--) {
 				cont = 1;
 			} else {
-				task->state = TASK_STATE_START;
+				task->state = TASK_STATE_IDLE;
 				resched = true;
 			}
 			break;
@@ -98,7 +98,7 @@ static void disable_task(struct rxe_task *task)
 static void enable_task(struct rxe_task *task)
 {
 	spin_lock_bh(&task->lock);
-	task->state = TASK_STATE_START;
+	task->state = TASK_STATE_IDLE;
 	spin_unlock_bh(&task->lock);
 
 	/* restart task in case */
@@ -110,11 +110,11 @@ static void cleanup_task(struct rxe_task *task)
 {
 	bool idle;
 
-	task->destroyed = true;
+	task->valid = false;
 
 	do {
 		spin_lock_bh(&task->lock);
-		idle = (task->state == TASK_STATE_START ||
+		idle = (task->state == TASK_STATE_IDLE ||
 			task->state == TASK_STATE_PAUSED);
 		spin_unlock_bh(&task->lock);
 	} while (!idle);
@@ -135,13 +135,13 @@ static void inline_run(struct rxe_task *task)
 
 static void inline_disable(struct rxe_task *task)
 {
-	if (!task->destroyed)
+	if (task->valid)
 		disable_task(task);
 }
 
 static void inline_enable(struct rxe_task *task)
 {
-	if (!task->destroyed)
+	if (task->valid)
 		enable_task(task);
 }
 
@@ -174,7 +174,7 @@ static void tsklet_do_task(struct tasklet_struct *tasklet)
 {
 	struct rxe_task *task = container_of(tasklet, typeof(*task), tasklet);
 
-	if (!task->destroyed)
+	if (task->valid)
 		do_task(task);
 }
 
@@ -186,13 +186,13 @@ static void tsklet_run(struct rxe_task *task)
 
 static void tsklet_disable(struct rxe_task *task)
 {
-	if (!task->destroyed)
+	if (task->valid)
 		disable_task(task);
 }
 
 static void tsklet_enable(struct rxe_task *task)
 {
-	if (!task->destroyed)
+	if (task->valid)
 		enable_task(task);
 }
 
@@ -219,11 +219,10 @@ static void tsklet_init(struct rxe_task *task)
 int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *),
 		  enum rxe_task_type type)
 {
-	task->arg	= arg;
-	task->func	= func;
-	task->destroyed	= false;
-	task->type	= type;
-	task->state	= TASK_STATE_START;
+	task->arg = arg;
+	task->func = func;
+	task->type = type;
+	task->state = TASK_STATE_IDLE;
 
 	spin_lock_init(&task->lock);
 
@@ -239,6 +238,8 @@ int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *),
 		return -EINVAL;
 	}
 
+	task->valid = true;
+
 	return 0;
 }
 
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index 792832786456..c81947e88629 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -23,7 +23,7 @@ enum rxe_task_type {
 };
 
 enum {
-	TASK_STATE_START	= 0,
+	TASK_STATE_IDLE		= 0,
 	TASK_STATE_BUSY		= 1,
 	TASK_STATE_ARMED	= 2,
 	TASK_STATE_PAUSED	= 3,
@@ -41,7 +41,7 @@ struct rxe_task {
 	void				*arg;
 	int				(*func)(void *arg);
 	int				ret;
-	bool				destroyed;
+	bool				valid;
 	const struct rxe_task_ops	*ops;
 	enum rxe_task_type		type;
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH for-next 15/16] RDMA/rxe: Add workqueue support for tasks
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
                   ` (13 preceding siblings ...)
  2022-10-18  4:33 ` [PATCH for-next 14/16] RDMA/rxe: Replace TASK_STATE_START by TASK_STATE_IDLE Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-18  8:59   ` Leon Romanovsky
  2022-10-18  4:33 ` [PATCH for-next 16/16] RDMA/rxe: Add parameters to control task type Bob Pearson
  2022-10-20 15:02 ` [PATCH for-next 00/16] Implement work queues for rdma_rxe haris iqbal
  16 siblings, 1 reply; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

Add a third task type RXE_TASK_TYPE_WORKQUEUE to rxe_task.c.

Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c      |  9 ++-
 drivers/infiniband/sw/rxe/rxe_task.c | 84 ++++++++++++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_task.h | 10 +++-
 3 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 51daac5c4feb..6d80218334ca 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -210,10 +210,16 @@ static int __init rxe_module_init(void)
 {
 	int err;
 
-	err = rxe_net_init();
+	err = rxe_alloc_wq();
 	if (err)
 		return err;
 
+	err = rxe_net_init();
+	if (err) {
+		rxe_destroy_wq();
+		return err;
+	}
+
 	rdma_link_register(&rxe_link_ops);
 	pr_info("loaded\n");
 	return 0;
@@ -224,6 +230,7 @@ static void __exit rxe_module_exit(void)
 	rdma_link_unregister(&rxe_link_ops);
 	ib_unregister_driver(RDMA_DRIVER_RXE);
 	rxe_net_exit();
+	rxe_destroy_wq();
 
 	pr_info("unloaded\n");
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index a2c58ce66c8f..ea33ea3bc0b1 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -10,6 +10,25 @@
 
 #include "rxe.h"
 
+static struct workqueue_struct *rxe_wq;
+
+int rxe_alloc_wq(void)
+{
+	rxe_wq = alloc_workqueue("rxe_wq", WQ_MEM_RECLAIM |
+				WQ_HIGHPRI | WQ_CPU_INTENSIVE |
+				WQ_SYSFS, WQ_MAX_ACTIVE);
+
+	if (!rxe_wq)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void rxe_destroy_wq(void)
+{
+	destroy_workqueue(rxe_wq);
+}
+
 static bool task_is_idle(struct rxe_task *task)
 {
 	if (!task->valid)
@@ -216,6 +235,68 @@ static void tsklet_init(struct rxe_task *task)
 	task->ops = &tsklet_ops;
 }
 
+static void work_sched(struct rxe_task *task)
+{
+	if (!task->valid)
+		return;
+
+	queue_work(rxe_wq, &task->work);
+}
+
+static void work_do_task(struct work_struct *work)
+{
+	struct rxe_task *task = container_of(work, typeof(*task), work);
+
+	if (!task->valid)
+		return;
+
+	do_task(task);
+}
+
+static void work_run(struct rxe_task *task)
+{
+	if (!task->valid)
+		return;
+
+	do_task(task);
+}
+
+static void work_enable(struct rxe_task *task)
+{
+	if (!task->valid)
+		return;
+
+	enable_task(task);
+}
+
+static void work_disable(struct rxe_task *task)
+{
+	if (!task->valid)
+		return;
+
+	disable_task(task);
+	flush_workqueue(rxe_wq);
+}
+
+static void work_cleanup(struct rxe_task *task)
+{
+	cleanup_task(task);
+}
+
+static const struct rxe_task_ops work_ops = {
+	.sched = work_sched,
+	.run = work_run,
+	.enable = work_enable,
+	.disable = work_disable,
+	.cleanup = work_cleanup,
+};
+
+static void work_init(struct rxe_task *task)
+{
+	INIT_WORK(&task->work, work_do_task);
+	task->ops = &work_ops;
+}
+
 int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *),
 		  enum rxe_task_type type)
 {
@@ -233,6 +314,9 @@ int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *),
 	case RXE_TASK_TYPE_TASKLET:
 		tsklet_init(task);
 		break;
+	case RXE_TASK_TYPE_WORKQUEUE:
+		work_init(task);
+		break;
 	default:
 		pr_debug("%s: invalid task type = %d\n", __func__, type);
 		return -EINVAL;
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index c81947e88629..4887ca566769 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -20,6 +20,7 @@ struct rxe_task_ops {
 enum rxe_task_type {
 	RXE_TASK_TYPE_INLINE	= 0,
 	RXE_TASK_TYPE_TASKLET	= 1,
+	RXE_TASK_TYPE_WORKQUEUE	= 2,
 };
 
 enum {
@@ -35,7 +36,10 @@ enum {
  * called again.
  */
 struct rxe_task {
-	struct tasklet_struct		tasklet;
+	union {
+		struct tasklet_struct		tasklet;
+		struct work_struct		work;
+	};
 	int				state;
 	spinlock_t			lock;
 	void				*arg;
@@ -46,6 +50,10 @@ struct rxe_task {
 	enum rxe_task_type		type;
 };
 
+int rxe_alloc_wq(void);
+
+void rxe_destroy_wq(void);
+
 int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *),
 		  enum rxe_task_type type);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH for-next 16/16] RDMA/rxe: Add parameters to control task type
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
                   ` (14 preceding siblings ...)
  2022-10-18  4:33 ` [PATCH for-next 15/16] RDMA/rxe: Add workqueue support for tasks Bob Pearson
@ 2022-10-18  4:33 ` Bob Pearson
  2022-10-18  9:02   ` Leon Romanovsky
  2022-10-20 15:02 ` [PATCH for-next 00/16] Implement work queues for rdma_rxe haris iqbal
  16 siblings, 1 reply; 29+ messages in thread
From: Bob Pearson @ 2022-10-18  4:33 UTC (permalink / raw)
  To: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba
  Cc: Bob Pearson

Add modparams to control the task types for req, comp, and resp
tasks.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_qp.c   | 6 +++---
 drivers/infiniband/sw/rxe/rxe_task.c | 8 ++++++++
 drivers/infiniband/sw/rxe/rxe_task.h | 4 ++++
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 50f6b8b8ad9d..673d52271062 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -238,9 +238,9 @@ static int rxe_qp_init_req(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	skb_queue_head_init(&qp->req_pkts);
 
-	rxe_init_task(&qp->req.task, qp, rxe_requester, RXE_TASK_TYPE_TASKLET);
+	rxe_init_task(&qp->req.task, qp, rxe_requester, rxe_req_task_type);
 	rxe_init_task(&qp->comp.task, qp, rxe_completer,
-			(qp_type(qp) == IB_QPT_RC) ? RXE_TASK_TYPE_TASKLET :
+			(qp_type(qp) == IB_QPT_RC) ? rxe_comp_task_type :
 						     RXE_TASK_TYPE_INLINE);
 
 	qp->qp_timeout_jiffies = 0; /* Can't be set for UD/UC in modify_qp */
@@ -288,7 +288,7 @@ static int rxe_qp_init_resp(struct rxe_dev *rxe, struct rxe_qp *qp,
 
 	skb_queue_head_init(&qp->resp_pkts);
 
-	rxe_init_task(&qp->resp.task, qp, rxe_responder, RXE_TASK_TYPE_TASKLET);
+	rxe_init_task(&qp->resp.task, qp, rxe_responder, rxe_resp_task_type);
 
 	qp->resp.opcode		= OPCODE_NONE;
 	qp->resp.msn		= 0;
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index ea33ea3bc0b1..350b033e6e59 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -10,6 +10,14 @@
 
 #include "rxe.h"
 
+int rxe_req_task_type = RXE_TASK_TYPE_TASKLET;
+int rxe_comp_task_type = RXE_TASK_TYPE_TASKLET;
+int rxe_resp_task_type = RXE_TASK_TYPE_TASKLET;
+
+module_param_named(req_task_type, rxe_req_task_type, int, 0664);
+module_param_named(comp_task_type, rxe_comp_task_type, int, 0664);
+module_param_named(resp_task_type, rxe_resp_task_type, int, 0664);
+
 static struct workqueue_struct *rxe_wq;
 
 int rxe_alloc_wq(void)
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index 4887ca566769..4c37b7c47a60 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -7,6 +7,10 @@
 #ifndef RXE_TASK_H
 #define RXE_TASK_H
 
+extern int rxe_req_task_type;
+extern int rxe_comp_task_type;
+extern int rxe_resp_task_type;
+
 struct rxe_task;
 
 struct rxe_task_ops {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH for-next 15/16] RDMA/rxe: Add workqueue support for tasks
  2022-10-18  4:33 ` [PATCH for-next 15/16] RDMA/rxe: Add workqueue support for tasks Bob Pearson
@ 2022-10-18  8:59   ` Leon Romanovsky
  2022-10-18 15:18     ` Bob Pearson
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2022-10-18  8:59 UTC (permalink / raw)
  To: Bob Pearson
  Cc: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, linux-rdma,
	jenny.hack, ian.ziemba

On Mon, Oct 17, 2022 at 11:33:46PM -0500, Bob Pearson wrote:
> Add a third task type RXE_TASK_TYPE_WORKQUEUE to rxe_task.c.

Why do you need an extra type and not instead of RXE_TASK_TYPE_TASKLET?

> 
> Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>  drivers/infiniband/sw/rxe/rxe.c      |  9 ++-
>  drivers/infiniband/sw/rxe/rxe_task.c | 84 ++++++++++++++++++++++++++++
>  drivers/infiniband/sw/rxe/rxe_task.h | 10 +++-
>  3 files changed, 101 insertions(+), 2 deletions(-)

<...>

> +static struct workqueue_struct *rxe_wq;
> +
> +int rxe_alloc_wq(void)
> +{
> +	rxe_wq = alloc_workqueue("rxe_wq", WQ_MEM_RECLAIM |
> +				WQ_HIGHPRI | WQ_CPU_INTENSIVE |
> +				WQ_SYSFS, WQ_MAX_ACTIVE);

Are you sure that all these flags can be justified? WQ_MEM_RECLAIM?

> +
> +	if (!rxe_wq)
> +		return -ENOMEM;
> +
> +	return 0;
> +}

<...>

> +static void work_sched(struct rxe_task *task)
> +{
> +	if (!task->valid)
> +		return;
> +
> +	queue_work(rxe_wq, &task->work);
> +}
> +
> +static void work_do_task(struct work_struct *work)
> +{
> +	struct rxe_task *task = container_of(work, typeof(*task), work);
> +
> +	if (!task->valid)
> +		return;

How can it be that submitted task is not valid? Especially without any
locking.

> +
> +	do_task(task);
> +}

Thanks

> +

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH for-next 16/16] RDMA/rxe: Add parameters to control task type
  2022-10-18  4:33 ` [PATCH for-next 16/16] RDMA/rxe: Add parameters to control task type Bob Pearson
@ 2022-10-18  9:02   ` Leon Romanovsky
  2022-10-18 15:22     ` Bob Pearson
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2022-10-18  9:02 UTC (permalink / raw)
  To: Bob Pearson
  Cc: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, linux-rdma,
	jenny.hack, ian.ziemba

On Mon, Oct 17, 2022 at 11:33:47PM -0500, Bob Pearson wrote:
> Add modparams to control the task types for req, comp, and resp
> tasks.

You need to be more descriptive why module parameters are unavoidable.

Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH for-next 15/16] RDMA/rxe: Add workqueue support for tasks
  2022-10-18  8:59   ` Leon Romanovsky
@ 2022-10-18 15:18     ` Bob Pearson
  2022-10-18 17:52       ` Leon Romanovsky
  2022-10-20  9:28       ` matsuda-daisuke
  0 siblings, 2 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-18 15:18 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, linux-rdma,
	jenny.hack, ian.ziemba

On 10/18/22 03:59, Leon Romanovsky wrote:
> On Mon, Oct 17, 2022 at 11:33:46PM -0500, Bob Pearson wrote:
>> Add a third task type RXE_TASK_TYPE_WORKQUEUE to rxe_task.c.
> 
> Why do you need an extra type and not instead of RXE_TASK_TYPE_TASKLET?

It performs much better in some settings.
> 
>>
>> Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> ---
>>  drivers/infiniband/sw/rxe/rxe.c      |  9 ++-
>>  drivers/infiniband/sw/rxe/rxe_task.c | 84 ++++++++++++++++++++++++++++
>>  drivers/infiniband/sw/rxe/rxe_task.h | 10 +++-
>>  3 files changed, 101 insertions(+), 2 deletions(-)
> 
> <...>
> 
>> +static struct workqueue_struct *rxe_wq;
>> +
>> +int rxe_alloc_wq(void)
>> +{
>> +	rxe_wq = alloc_workqueue("rxe_wq", WQ_MEM_RECLAIM |
>> +				WQ_HIGHPRI | WQ_CPU_INTENSIVE |
>> +				WQ_SYSFS, WQ_MAX_ACTIVE);
> 
> Are you sure that all these flags can be justified? WQ_MEM_RECLAIM?

Not really. CPU intensive is most likely correct. The rest not so much.
> 
>> +
>> +	if (!rxe_wq)
>> +		return -ENOMEM;
>> +
>> +	return 0;
>> +}
> 
> <...>
> 
>> +static void work_sched(struct rxe_task *task)
>> +{
>> +	if (!task->valid)
>> +		return;
>> +
>> +	queue_work(rxe_wq, &task->work);
>> +}
>> +
>> +static void work_do_task(struct work_struct *work)
>> +{
>> +	struct rxe_task *task = container_of(work, typeof(*task), work);
>> +
>> +	if (!task->valid)
>> +		return;
> 
> How can it be that submitted task is not valid? Especially without any
> locking.

This and a similar subroutine for tasklets are called deferred and can have a significant
delay before being called. In the mean time someone could have tried to destroy the QP. The valid
flag is only cleared by QP destroy code and is not turned back on. Perhaps a rmb().
> 
>> +
>> +	do_task(task);
>> +}
> 
> Thanks
> 
>> +


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH for-next 16/16] RDMA/rxe: Add parameters to control task type
  2022-10-18  9:02   ` Leon Romanovsky
@ 2022-10-18 15:22     ` Bob Pearson
  2022-10-18 17:55       ` Leon Romanovsky
  0 siblings, 1 reply; 29+ messages in thread
From: Bob Pearson @ 2022-10-18 15:22 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, linux-rdma,
	jenny.hack, ian.ziemba

On 10/18/22 04:02, Leon Romanovsky wrote:
> On Mon, Oct 17, 2022 at 11:33:47PM -0500, Bob Pearson wrote:
>> Add modparams to control the task types for req, comp, and resp
>> tasks.
> 
> You need to be more descriptive why module parameters are unavoidable.
> 
> Thanks

I asked Jason what was the best way here and didn't get an answer. These are tuning parameters.
Generally I am not sure how to present them to users. They are pretty specific to this
driver so the rdma app seems a bad choice. I know netlink is the preferred way to talk to
rdma-core but I haven't figured out how it works. I suspect this is temporary and work queues
will replace tasklets in this driver once people are used to it.

Bob

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH for-next 15/16] RDMA/rxe: Add workqueue support for tasks
  2022-10-18 15:18     ` Bob Pearson
@ 2022-10-18 17:52       ` Leon Romanovsky
  2022-10-20  9:28       ` matsuda-daisuke
  1 sibling, 0 replies; 29+ messages in thread
From: Leon Romanovsky @ 2022-10-18 17:52 UTC (permalink / raw)
  To: Bob Pearson
  Cc: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, linux-rdma,
	jenny.hack, ian.ziemba

On Tue, Oct 18, 2022 at 10:18:13AM -0500, Bob Pearson wrote:
> On 10/18/22 03:59, Leon Romanovsky wrote:
> > On Mon, Oct 17, 2022 at 11:33:46PM -0500, Bob Pearson wrote:
> >> Add a third task type RXE_TASK_TYPE_WORKQUEUE to rxe_task.c.
> > 
> > Why do you need an extra type and not instead of RXE_TASK_TYPE_TASKLET?
> 
> It performs much better in some settings.
> > 
> >>
> >> Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
> >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> >> ---
> >>  drivers/infiniband/sw/rxe/rxe.c      |  9 ++-
> >>  drivers/infiniband/sw/rxe/rxe_task.c | 84 ++++++++++++++++++++++++++++
> >>  drivers/infiniband/sw/rxe/rxe_task.h | 10 +++-
> >>  3 files changed, 101 insertions(+), 2 deletions(-)

<...>

> >> +static void work_do_task(struct work_struct *work)
> >> +{
> >> +	struct rxe_task *task = container_of(work, typeof(*task), work);
> >> +
> >> +	if (!task->valid)
> >> +		return;
> > 
> > How can it be that submitted task is not valid? Especially without any
> > locking.
> 
> This and a similar subroutine for tasklets are called deferred and can have a significant
> delay before being called. In the mean time someone could have tried to destroy the QP. The valid
> flag is only cleared by QP destroy code and is not turned back on. Perhaps a rmb().

rmb() is not a locking.

> > 
> >> +
> >> +	do_task(task);
> >> +}
> > 
> > Thanks
> > 
> >> +
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH for-next 16/16] RDMA/rxe: Add parameters to control task type
  2022-10-18 15:22     ` Bob Pearson
@ 2022-10-18 17:55       ` Leon Romanovsky
  0 siblings, 0 replies; 29+ messages in thread
From: Leon Romanovsky @ 2022-10-18 17:55 UTC (permalink / raw)
  To: Bob Pearson
  Cc: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, linux-rdma,
	jenny.hack, ian.ziemba

On Tue, Oct 18, 2022 at 10:22:09AM -0500, Bob Pearson wrote:
> On 10/18/22 04:02, Leon Romanovsky wrote:
> > On Mon, Oct 17, 2022 at 11:33:47PM -0500, Bob Pearson wrote:
> >> Add modparams to control the task types for req, comp, and resp
> >> tasks.
> > 
> > You need to be more descriptive why module parameters are unavoidable.
> > 
> > Thanks
> 
> I asked Jason what was the best way here and didn't get an answer. These are tuning parameters.
> Generally I am not sure how to present them to users. They are pretty specific to this
> driver so the rdma app seems a bad choice. I know netlink is the preferred way to talk to
> rdma-core but I haven't figured out how it works. I suspect this is temporary and work queues
> will replace tasklets in this driver once people are used to it.

I think that the best way is to remove tasklets from RXE, unless someone
comes forward to explain why they must to stay (not theoretical explanation,
but practical use).

Thanks

> 
> Bob

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/16] for-next RDMA/rxe: Make rxe_do_task static
  2022-10-18  4:33 ` [PATCH 04/16] for-next RDMA/rxe: Make rxe_do_task static Bob Pearson
@ 2022-10-19  9:39   ` matsuda-daisuke
  0 siblings, 0 replies; 29+ messages in thread
From: matsuda-daisuke @ 2022-10-19  9:39 UTC (permalink / raw)
  To: 'Bob Pearson',
	jgg, zyjzyj2000, lizhijian, leon, linux-rdma, jenny.hack,
	ian.ziemba
  Cc: matsuda-daisuke

On Tue, Oct 18, 2022 1:34 PM Bob Pearson wrote:
> The subroutine rxe_do_task() is only called in rxe_task.c. This patch
> makes it static and renames it do_task().
> 
> Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
While the description above says "renames it do_task()", this patch
actually does not rename. Because of this, I failed to apply the 5th patch.

BTW, thank you very much for posting the patches.

Daisuke

>  drivers/infiniband/sw/rxe/rxe_task.c | 2 +-
>  drivers/infiniband/sw/rxe/rxe_task.h | 8 --------
>  2 files changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
> index 446ee2c3d381..097ddb16c230 100644
> --- a/drivers/infiniband/sw/rxe/rxe_task.c
> +++ b/drivers/infiniband/sw/rxe/rxe_task.c
> @@ -28,7 +28,7 @@ int __rxe_do_task(struct rxe_task *task)
>   * a second caller finds the task already running
>   * but looks just after the last call to func
>   */
> -void rxe_do_task(struct tasklet_struct *t)
> +static void rxe_do_task(struct tasklet_struct *t)
>  {
>  	int cont;
>  	int ret;
> diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
> index 590b1c1d7e7c..99e0173e5c46 100644
> --- a/drivers/infiniband/sw/rxe/rxe_task.h
> +++ b/drivers/infiniband/sw/rxe/rxe_task.h
> @@ -44,14 +44,6 @@ void rxe_cleanup_task(struct rxe_task *task);
>   */
>  int __rxe_do_task(struct rxe_task *task);
> 
> -/*
> - * common function called by any of the main tasklets
> - * If there is any chance that there is additional
> - * work to do someone must reschedule the task before
> - * leaving
> - */
> -void rxe_do_task(struct tasklet_struct *t);
> -
>  void rxe_run_task(struct rxe_task *task);
> 
>  void rxe_sched_task(struct rxe_task *task);
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH for-next 15/16] RDMA/rxe: Add workqueue support for tasks
  2022-10-18 15:18     ` Bob Pearson
  2022-10-18 17:52       ` Leon Romanovsky
@ 2022-10-20  9:28       ` matsuda-daisuke
  1 sibling, 0 replies; 29+ messages in thread
From: matsuda-daisuke @ 2022-10-20  9:28 UTC (permalink / raw)
  To: 'Bob Pearson', Leon Romanovsky
  Cc: jgg, zyjzyj2000, lizhijian, linux-rdma, jenny.hack, ian.ziemba

On Wed, Oct 19, 2022 12:18 AM Bob Pearson wrote:
> On 10/18/22 03:59, Leon Romanovsky wrote:
> > On Mon, Oct 17, 2022 at 11:33:46PM -0500, Bob Pearson wrote:
> >> Add a third task type RXE_TASK_TYPE_WORKQUEUE to rxe_task.c.
> >
> > Why do you need an extra type and not instead of RXE_TASK_TYPE_TASKLET?
> 
> It performs much better in some settings.
> >
> >>
> >> Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
> >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> >> ---
> >>  drivers/infiniband/sw/rxe/rxe.c      |  9 ++-
> >>  drivers/infiniband/sw/rxe/rxe_task.c | 84 ++++++++++++++++++++++++++++
> >>  drivers/infiniband/sw/rxe/rxe_task.h | 10 +++-
> >>  3 files changed, 101 insertions(+), 2 deletions(-)
> >
> > <...>
> >
> >> +static struct workqueue_struct *rxe_wq;
> >> +
> >> +int rxe_alloc_wq(void)
> >> +{
> >> +	rxe_wq = alloc_workqueue("rxe_wq", WQ_MEM_RECLAIM |
> >> +				WQ_HIGHPRI | WQ_CPU_INTENSIVE |
> >> +				WQ_SYSFS, WQ_MAX_ACTIVE);
> >
> > Are you sure that all these flags can be justified? WQ_MEM_RECLAIM?
> 
> Not really. CPU intensive is most likely correct. The rest not so much.

I doubt this workqueue executes works in the queued order. If the order is changed
unexpectedly on responder, that may cause a failure or unexpected results.
Perhaps, we should make it equivalent to alloc_ordered_workqueue() as shown below:
=== workqueue.h===
#define alloc_ordered_workqueue(fmt, flags, args...)                    \
        alloc_workqueue(fmt, WQ_UNBOUND | __WQ_ORDERED |                \
                        __WQ_ORDERED_EXPLICIT | (flags), 1, ##args)
===

Daisuke

> >
> >> +
> >> +	if (!rxe_wq)
> >> +		return -ENOMEM;
> >> +
> >> +	return 0;
> >> +}
> >
> > <...>
> >
> >> +static void work_sched(struct rxe_task *task)
> >> +{
> >> +	if (!task->valid)
> >> +		return;
> >> +
> >> +	queue_work(rxe_wq, &task->work);
> >> +}
> >> +
> >> +static void work_do_task(struct work_struct *work)
> >> +{
> >> +	struct rxe_task *task = container_of(work, typeof(*task), work);
> >> +
> >> +	if (!task->valid)
> >> +		return;
> >
> > How can it be that submitted task is not valid? Especially without any
> > locking.
> 
> This and a similar subroutine for tasklets are called deferred and can have a significant
> delay before being called. In the mean time someone could have tried to destroy the QP. The valid
> flag is only cleared by QP destroy code and is not turned back on. Perhaps a rmb().
> >
> >> +
> >> +	do_task(task);
> >> +}
> >
> > Thanks
> >
> >> +


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH for-next 00/16] Implement work queues for rdma_rxe
  2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
                   ` (15 preceding siblings ...)
  2022-10-18  4:33 ` [PATCH for-next 16/16] RDMA/rxe: Add parameters to control task type Bob Pearson
@ 2022-10-20 15:02 ` haris iqbal
  2022-10-21  2:46   ` matsuda-daisuke
  2022-10-21  6:02   ` Bob Pearson
  16 siblings, 2 replies; 29+ messages in thread
From: haris iqbal @ 2022-10-20 15:02 UTC (permalink / raw)
  To: Bob Pearson
  Cc: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba

On Tue, Oct 18, 2022 at 6:39 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> This patch series implements work queues as an alternative for
> the main tasklets in the rdma_rxe driver. The first few patches
> perform some cleanups of the current tasklet code followed by a
> patch that makes the internal API for task execution pluggable and
> implements an inline and a tasklet based set of functions.
> The remaining patches cleanup the qp reset and error code in the
> three tasklets and modify the locking logic to prevent making
> multiple calls to the tasklet scheduling routine. Finally after
> this preparation the work queue equivalent set of functions is
> added and module parameters are implemented to allow tuning the
> task types.
>
> The advantages of the work queue version of deferred task execution
> is mainly that the work queue variant has much better scalability
> and overall performance than the tasklet variant. The tasklet
> performance saturates with one connected queue pair and stays constant.
> The work queue performance is slightly better for one queue pair but
> scales up with the number of connected queue pairs. The perftest
> microbenchmarks in local loopback mode (not a very realistic test
> case) can reach approximately 100Gb/sec with work queues compared to
> about 16Gb/sec for tasklets.
>
> This patch series is derived from an earlier patch set developed by
> Ian Ziemba at HPE which is used in some Lustre storage clients attached
> to Lustre servers with hard RoCE v2 NICs.
>
> Bob Pearson (16):
>   RDMA/rxe: Remove init of task locks from rxe_qp.c
>   RDMA/rxe: Removed unused name from rxe_task struct
>   RDMA/rxe: Split rxe_run_task() into two subroutines
>   RDMA/rxe: Make rxe_do_task static
>   RDMA/rxe: Rename task->state_lock to task->lock
>   RDMA/rxe: Make task interface pluggable
>   RDMA/rxe: Simplify reset state handling in rxe_resp.c
>   RDMA/rxe: Split rxe_drain_resp_pkts()
>   RDMA/rxe: Handle qp error in rxe_resp.c
>   RDMA/rxe: Cleanup comp tasks in rxe_qp.c
>   RDMA/rxe: Remove __rxe_do_task()
>   RDMA/rxe: Make tasks schedule each other
>   RDMA/rxe: Implement disable/enable_task()
>   RDMA/rxe: Replace TASK_STATE_START by TASK_STATE_IDLE
>   RDMA/rxe: Add workqueue support for tasks
>   RDMA/rxe: Add parameters to control task type
>
>  drivers/infiniband/sw/rxe/rxe.c       |   9 +-
>  drivers/infiniband/sw/rxe/rxe_comp.c  |  35 ++-
>  drivers/infiniband/sw/rxe/rxe_net.c   |   4 +-
>  drivers/infiniband/sw/rxe/rxe_qp.c    |  87 +++----
>  drivers/infiniband/sw/rxe/rxe_req.c   |  10 +-
>  drivers/infiniband/sw/rxe/rxe_resp.c  |  75 ++++--
>  drivers/infiniband/sw/rxe/rxe_task.c  | 354 ++++++++++++++++++++------
>  drivers/infiniband/sw/rxe/rxe_task.h  |  76 +++---
>  drivers/infiniband/sw/rxe/rxe_verbs.c |   8 +-
>  9 files changed, 451 insertions(+), 207 deletions(-)
>
>
> base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780

The patch series is not applying cleanly over the mentioned commit for
me. Patch 'PATCH for-next 05/16] RDMA/rxe: Rename task->state_lock to
task->lock.' fails at "drivers/infiniband/sw/rxe/rxe_task.c:103".
I corrected that manually, then it fails in the next commit. Didn't
check after that. Is it the same for others or is it just me?

> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH for-next 00/16] Implement work queues for rdma_rxe
  2022-10-20 15:02 ` [PATCH for-next 00/16] Implement work queues for rdma_rxe haris iqbal
@ 2022-10-21  2:46   ` matsuda-daisuke
  2022-10-21  3:40     ` Bob Pearson
  2022-10-21  6:02   ` Bob Pearson
  1 sibling, 1 reply; 29+ messages in thread
From: matsuda-daisuke @ 2022-10-21  2:46 UTC (permalink / raw)
  To: 'haris iqbal', Bob Pearson
  Cc: jgg, zyjzyj2000, lizhijian, leon, linux-rdma, jenny.hack, ian.ziemba

On Fri, Oct 21, 2022 12:02 AM haris Iqbal wrote:
> 
> On Tue, Oct 18, 2022 at 6:39 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
> >
> > This patch series implements work queues as an alternative for
> > the main tasklets in the rdma_rxe driver. The first few patches
> > perform some cleanups of the current tasklet code followed by a
> > patch that makes the internal API for task execution pluggable and
> > implements an inline and a tasklet based set of functions.
> > The remaining patches cleanup the qp reset and error code in the
> > three tasklets and modify the locking logic to prevent making
> > multiple calls to the tasklet scheduling routine. Finally after
> > this preparation the work queue equivalent set of functions is
> > added and module parameters are implemented to allow tuning the
> > task types.
> >
> > The advantages of the work queue version of deferred task execution
> > is mainly that the work queue variant has much better scalability
> > and overall performance than the tasklet variant. The tasklet
> > performance saturates with one connected queue pair and stays constant.
> > The work queue performance is slightly better for one queue pair but
> > scales up with the number of connected queue pairs. The perftest
> > microbenchmarks in local loopback mode (not a very realistic test
> > case) can reach approximately 100Gb/sec with work queues compared to
> > about 16Gb/sec for tasklets.
> >
> > This patch series is derived from an earlier patch set developed by
> > Ian Ziemba at HPE which is used in some Lustre storage clients attached
> > to Lustre servers with hard RoCE v2 NICs.
> >
> > Bob Pearson (16):
> >   RDMA/rxe: Remove init of task locks from rxe_qp.c
> >   RDMA/rxe: Removed unused name from rxe_task struct
> >   RDMA/rxe: Split rxe_run_task() into two subroutines
> >   RDMA/rxe: Make rxe_do_task static
> >   RDMA/rxe: Rename task->state_lock to task->lock
> >   RDMA/rxe: Make task interface pluggable
> >   RDMA/rxe: Simplify reset state handling in rxe_resp.c
> >   RDMA/rxe: Split rxe_drain_resp_pkts()
> >   RDMA/rxe: Handle qp error in rxe_resp.c
> >   RDMA/rxe: Cleanup comp tasks in rxe_qp.c
> >   RDMA/rxe: Remove __rxe_do_task()
> >   RDMA/rxe: Make tasks schedule each other
> >   RDMA/rxe: Implement disable/enable_task()
> >   RDMA/rxe: Replace TASK_STATE_START by TASK_STATE_IDLE
> >   RDMA/rxe: Add workqueue support for tasks
> >   RDMA/rxe: Add parameters to control task type
> >
> >  drivers/infiniband/sw/rxe/rxe.c       |   9 +-
> >  drivers/infiniband/sw/rxe/rxe_comp.c  |  35 ++-
> >  drivers/infiniband/sw/rxe/rxe_net.c   |   4 +-
> >  drivers/infiniband/sw/rxe/rxe_qp.c    |  87 +++----
> >  drivers/infiniband/sw/rxe/rxe_req.c   |  10 +-
> >  drivers/infiniband/sw/rxe/rxe_resp.c  |  75 ++++--
> >  drivers/infiniband/sw/rxe/rxe_task.c  | 354 ++++++++++++++++++++------
> >  drivers/infiniband/sw/rxe/rxe_task.h  |  76 +++---
> >  drivers/infiniband/sw/rxe/rxe_verbs.c |   8 +-
> >  9 files changed, 451 insertions(+), 207 deletions(-)
> >
> >
> > base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780
> 
> The patch series is not applying cleanly over the mentioned commit for
> me. Patch 'PATCH for-next 05/16] RDMA/rxe: Rename task->state_lock to
> task->lock.' fails at "drivers/infiniband/sw/rxe/rxe_task.c:103".
> I corrected that manually, then it fails in the next commit. Didn't
> check after that. Is it the same for others or is it just me?

There is a problem with the 4th patch. Its subject is different from other patches,
so probably it was not generated at the same time with them. I could apply the rest
after adding the following change to the 4th:
===
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 097ddb16c230..a7203b93e5cc 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -28,7 +28,7 @@ int __rxe_do_task(struct rxe_task *task)
  * a second caller finds the task already running
  * but looks just after the last call to func
  */
-static void rxe_do_task(struct tasklet_struct *t)
+static void do_task(struct tasklet_struct *t)
 {
        int cont;
        int ret;
@@ -100,7 +100,7 @@ int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *))
        task->func      = func;
        task->destroyed = false;

-       tasklet_setup(&task->tasklet, rxe_do_task);
+       tasklet_setup(&task->tasklet, do_task);

        task->state = TASK_STATE_START;
        spin_lock_init(&task->state_lock);
@@ -132,7 +132,7 @@ void rxe_run_task(struct rxe_task *task)
        if (task->destroyed)
                return;

-       rxe_do_task(&task->tasklet);
+       do_task(&task->tasklet);
 }

 void rxe_sched_task(struct rxe_task *task)
===

Daisuke

> 
> > --
> > 2.34.1
> >

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH for-next 00/16] Implement work queues for rdma_rxe
  2022-10-21  2:46   ` matsuda-daisuke
@ 2022-10-21  3:40     ` Bob Pearson
  0 siblings, 0 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-21  3:40 UTC (permalink / raw)
  To: matsuda-daisuke, 'haris iqbal'
  Cc: jgg, zyjzyj2000, lizhijian, leon, linux-rdma, jenny.hack, ian.ziemba

On 10/20/22 21:46, matsuda-daisuke@fujitsu.com wrote:
> On Fri, Oct 21, 2022 12:02 AM haris Iqbal wrote:
>>
>> On Tue, Oct 18, 2022 at 6:39 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>>
>>> This patch series implements work queues as an alternative for
>>> the main tasklets in the rdma_rxe driver. The first few patches
>>> perform some cleanups of the current tasklet code followed by a
>>> patch that makes the internal API for task execution pluggable and
>>> implements an inline and a tasklet based set of functions.
>>> The remaining patches cleanup the qp reset and error code in the
>>> three tasklets and modify the locking logic to prevent making
>>> multiple calls to the tasklet scheduling routine. Finally after
>>> this preparation the work queue equivalent set of functions is
>>> added and module parameters are implemented to allow tuning the
>>> task types.
>>>
>>> The advantages of the work queue version of deferred task execution
>>> is mainly that the work queue variant has much better scalability
>>> and overall performance than the tasklet variant. The tasklet
>>> performance saturates with one connected queue pair and stays constant.
>>> The work queue performance is slightly better for one queue pair but
>>> scales up with the number of connected queue pairs. The perftest
>>> microbenchmarks in local loopback mode (not a very realistic test
>>> case) can reach approximately 100Gb/sec with work queues compared to
>>> about 16Gb/sec for tasklets.
>>>
>>> This patch series is derived from an earlier patch set developed by
>>> Ian Ziemba at HPE which is used in some Lustre storage clients attached
>>> to Lustre servers with hard RoCE v2 NICs.
>>>
>>> Bob Pearson (16):
>>>   RDMA/rxe: Remove init of task locks from rxe_qp.c
>>>   RDMA/rxe: Removed unused name from rxe_task struct
>>>   RDMA/rxe: Split rxe_run_task() into two subroutines
>>>   RDMA/rxe: Make rxe_do_task static
>>>   RDMA/rxe: Rename task->state_lock to task->lock
>>>   RDMA/rxe: Make task interface pluggable
>>>   RDMA/rxe: Simplify reset state handling in rxe_resp.c
>>>   RDMA/rxe: Split rxe_drain_resp_pkts()
>>>   RDMA/rxe: Handle qp error in rxe_resp.c
>>>   RDMA/rxe: Cleanup comp tasks in rxe_qp.c
>>>   RDMA/rxe: Remove __rxe_do_task()
>>>   RDMA/rxe: Make tasks schedule each other
>>>   RDMA/rxe: Implement disable/enable_task()
>>>   RDMA/rxe: Replace TASK_STATE_START by TASK_STATE_IDLE
>>>   RDMA/rxe: Add workqueue support for tasks
>>>   RDMA/rxe: Add parameters to control task type
>>>
>>>  drivers/infiniband/sw/rxe/rxe.c       |   9 +-
>>>  drivers/infiniband/sw/rxe/rxe_comp.c  |  35 ++-
>>>  drivers/infiniband/sw/rxe/rxe_net.c   |   4 +-
>>>  drivers/infiniband/sw/rxe/rxe_qp.c    |  87 +++----
>>>  drivers/infiniband/sw/rxe/rxe_req.c   |  10 +-
>>>  drivers/infiniband/sw/rxe/rxe_resp.c  |  75 ++++--
>>>  drivers/infiniband/sw/rxe/rxe_task.c  | 354 ++++++++++++++++++++------
>>>  drivers/infiniband/sw/rxe/rxe_task.h  |  76 +++---
>>>  drivers/infiniband/sw/rxe/rxe_verbs.c |   8 +-
>>>  9 files changed, 451 insertions(+), 207 deletions(-)
>>>
>>>
>>> base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780
>>
>> The patch series is not applying cleanly over the mentioned commit for
>> me. Patch 'PATCH for-next 05/16] RDMA/rxe: Rename task->state_lock to
>> task->lock.' fails at "drivers/infiniband/sw/rxe/rxe_task.c:103".
>> I corrected that manually, then it fails in the next commit. Didn't
>> check after that. Is it the same for others or is it just me?
> 
> There is a problem with the 4th patch. Its subject is different from other patches,
> so probably it was not generated at the same time with them. I could apply the rest
> after adding the following change to the 4th:
> ===

I messed up the subject and resent it just after the other ones.

> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
> index 097ddb16c230..a7203b93e5cc 100644
> --- a/drivers/infiniband/sw/rxe/rxe_task.c
> +++ b/drivers/infiniband/sw/rxe/rxe_task.c
> @@ -28,7 +28,7 @@ int __rxe_do_task(struct rxe_task *task)
>   * a second caller finds the task already running
>   * but looks just after the last call to func
>   */
> -static void rxe_do_task(struct tasklet_struct *t)
> +static void do_task(struct tasklet_struct *t)
>  {
>         int cont;
>         int ret;
> @@ -100,7 +100,7 @@ int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *))
>         task->func      = func;
>         task->destroyed = false;
> 
> -       tasklet_setup(&task->tasklet, rxe_do_task);
> +       tasklet_setup(&task->tasklet, do_task);
> 
>         task->state = TASK_STATE_START;
>         spin_lock_init(&task->state_lock);
> @@ -132,7 +132,7 @@ void rxe_run_task(struct rxe_task *task)
>         if (task->destroyed)
>                 return;
> 
> -       rxe_do_task(&task->tasklet);
> +       do_task(&task->tasklet);
>  }
> 
>  void rxe_sched_task(struct rxe_task *task)
> ===
> 
> Daisuke
> 
>>
>>> --
>>> 2.34.1
>>>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH for-next 00/16] Implement work queues for rdma_rxe
  2022-10-20 15:02 ` [PATCH for-next 00/16] Implement work queues for rdma_rxe haris iqbal
  2022-10-21  2:46   ` matsuda-daisuke
@ 2022-10-21  6:02   ` Bob Pearson
  1 sibling, 0 replies; 29+ messages in thread
From: Bob Pearson @ 2022-10-21  6:02 UTC (permalink / raw)
  To: haris iqbal
  Cc: jgg, zyjzyj2000, matsuda-daisuke, lizhijian, leon, linux-rdma,
	jenny.hack, ian.ziemba

On 10/20/22 10:02, haris iqbal wrote:
> On Tue, Oct 18, 2022 at 6:39 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>
>> This patch series implements work queues as an alternative for
>> the main tasklets in the rdma_rxe driver. The first few patches
>> perform some cleanups of the current tasklet code followed by a
>> patch that makes the internal API for task execution pluggable and
>> implements an inline and a tasklet based set of functions.
>> The remaining patches cleanup the qp reset and error code in the
>> three tasklets and modify the locking logic to prevent making
>> multiple calls to the tasklet scheduling routine. Finally after
>> this preparation the work queue equivalent set of functions is
>> added and module parameters are implemented to allow tuning the
>> task types.
>>
>> The advantages of the work queue version of deferred task execution
>> is mainly that the work queue variant has much better scalability
>> and overall performance than the tasklet variant. The tasklet
>> performance saturates with one connected queue pair and stays constant.
>> The work queue performance is slightly better for one queue pair but
>> scales up with the number of connected queue pairs. The perftest
>> microbenchmarks in local loopback mode (not a very realistic test
>> case) can reach approximately 100Gb/sec with work queues compared to
>> about 16Gb/sec for tasklets.
>>
>> This patch series is derived from an earlier patch set developed by
>> Ian Ziemba at HPE which is used in some Lustre storage clients attached
>> to Lustre servers with hard RoCE v2 NICs.
>>
>> Bob Pearson (16):
>>   RDMA/rxe: Remove init of task locks from rxe_qp.c
>>   RDMA/rxe: Removed unused name from rxe_task struct
>>   RDMA/rxe: Split rxe_run_task() into two subroutines
>>   RDMA/rxe: Make rxe_do_task static
>>   RDMA/rxe: Rename task->state_lock to task->lock
>>   RDMA/rxe: Make task interface pluggable
>>   RDMA/rxe: Simplify reset state handling in rxe_resp.c
>>   RDMA/rxe: Split rxe_drain_resp_pkts()
>>   RDMA/rxe: Handle qp error in rxe_resp.c
>>   RDMA/rxe: Cleanup comp tasks in rxe_qp.c
>>   RDMA/rxe: Remove __rxe_do_task()
>>   RDMA/rxe: Make tasks schedule each other
>>   RDMA/rxe: Implement disable/enable_task()
>>   RDMA/rxe: Replace TASK_STATE_START by TASK_STATE_IDLE
>>   RDMA/rxe: Add workqueue support for tasks
>>   RDMA/rxe: Add parameters to control task type
>>
>>  drivers/infiniband/sw/rxe/rxe.c       |   9 +-
>>  drivers/infiniband/sw/rxe/rxe_comp.c  |  35 ++-
>>  drivers/infiniband/sw/rxe/rxe_net.c   |   4 +-
>>  drivers/infiniband/sw/rxe/rxe_qp.c    |  87 +++----
>>  drivers/infiniband/sw/rxe/rxe_req.c   |  10 +-
>>  drivers/infiniband/sw/rxe/rxe_resp.c  |  75 ++++--
>>  drivers/infiniband/sw/rxe/rxe_task.c  | 354 ++++++++++++++++++++------
>>  drivers/infiniband/sw/rxe/rxe_task.h  |  76 +++---
>>  drivers/infiniband/sw/rxe/rxe_verbs.c |   8 +-
>>  9 files changed, 451 insertions(+), 207 deletions(-)
>>
>>
>> base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780
> 
> The patch series is not applying cleanly over the mentioned commit for
> me. Patch 'PATCH for-next 05/16] RDMA/rxe: Rename task->state_lock to
> task->lock.' fails at "drivers/infiniband/sw/rxe/rxe_task.c:103".
> I corrected that manually, then it fails in the next commit. Didn't
> check after that. Is it the same for others or is it just me?
> 
>> --
>> 2.34.1
>>

This worked for me. There was the botched 4/16 which I resent just after the other ones.
You may need to delete the first 4/16 and use the second one. I am going to resend it
tomorrow. There are a couple of things folks have pointed out that I want to address.

Bob

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-10-21  6:02 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-18  4:33 [PATCH for-next 00/16] Implement work queues for rdma_rxe Bob Pearson
2022-10-18  4:33 ` [PATCH for-next 01/16] RDMA/rxe: Remove init of task locks from rxe_qp.c Bob Pearson
2022-10-18  4:33 ` [PATCH for-next 02/16] RDMA/rxe: Removed unused name from rxe_task struct Bob Pearson
2022-10-18  4:33 ` [PATCH for-next 03/16] RDMA/rxe: Split rxe_run_task() into two subroutines Bob Pearson
2022-10-18  4:33 ` [PATCH 04/16] for-next RDMA/rxe: Make rxe_do_task static Bob Pearson
2022-10-19  9:39   ` matsuda-daisuke
2022-10-18  4:33 ` [PATCH for-next 05/16] RDMA/rxe: Rename task->state_lock to task->lock Bob Pearson
2022-10-18  4:33 ` [PATCH for-next 06/16] RDMA/rxe: Make task interface pluggable Bob Pearson
2022-10-18  4:33 ` [PATCH for-next 07/16] RDMA/rxe: Simplify reset state handling in rxe_resp.c Bob Pearson
2022-10-18  4:33 ` [PATCH for-next 08/16] RDMA/rxe: Split rxe_drain_resp_pkts() Bob Pearson
2022-10-18  4:33 ` [PATCH for-next 09/16] RDMA/rxe: Handle qp error in rxe_resp.c Bob Pearson
2022-10-18  4:33 ` [PATCH for-next 10/16] RDMA/rxe: Cleanup comp tasks in rxe_qp.c Bob Pearson
2022-10-18  4:33 ` [PATCH for-next 11/16] RDMA/rxe: Remove __rxe_do_task() Bob Pearson
2022-10-18  4:33 ` [PATCH for-next 12/16] RDMA/rxe: Make tasks schedule each other Bob Pearson
2022-10-18  4:33 ` [PATCH for-next 13/16] RDMA/rxe: Implement disable/enable_task() Bob Pearson
2022-10-18  4:33 ` [PATCH for-next 14/16] RDMA/rxe: Replace TASK_STATE_START by TASK_STATE_IDLE Bob Pearson
2022-10-18  4:33 ` [PATCH for-next 15/16] RDMA/rxe: Add workqueue support for tasks Bob Pearson
2022-10-18  8:59   ` Leon Romanovsky
2022-10-18 15:18     ` Bob Pearson
2022-10-18 17:52       ` Leon Romanovsky
2022-10-20  9:28       ` matsuda-daisuke
2022-10-18  4:33 ` [PATCH for-next 16/16] RDMA/rxe: Add parameters to control task type Bob Pearson
2022-10-18  9:02   ` Leon Romanovsky
2022-10-18 15:22     ` Bob Pearson
2022-10-18 17:55       ` Leon Romanovsky
2022-10-20 15:02 ` [PATCH for-next 00/16] Implement work queues for rdma_rxe haris iqbal
2022-10-21  2:46   ` matsuda-daisuke
2022-10-21  3:40     ` Bob Pearson
2022-10-21  6:02   ` Bob Pearson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.