All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/4] RDMA/rxe: Fix no completion event issue
@ 2022-07-04  6:00 lizhijian
  2022-07-04  6:00 ` [PATCH v5 1/4] RDMA/rxe: Update wqe_index for each wqe error completion lizhijian
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: lizhijian @ 2022-07-04  6:00 UTC (permalink / raw)
  To: Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, linux-rdma, Bob Pearson
  Cc: Cheng Xu, lizhijian

It's observed that no more completion occurs after a few incorrect posts.
Actually, it will block the polling. we can easily reproduce it by the below
pattern.

a. post correct RDMA_WRITE
b. poll completion event
while true {
  c. post incorrect RDMA_WRITE(wrong rkey for example)
  d. poll completion event <<<< block after 2 incorrect RDMA_WRITE posts
}

V4 add new patch from Bob where it make requester stop executing qp
operation as soon as possible.

Both blktests and pyverbs tests are passed fine.

Bob Pearson (1):
  RDMA/rxe: Split qp state for requester and completer

Li Zhijian (3):
  RDMA/rxe: Update wqe_index for each wqe error completion
  RDMA/rxe: Generate error completion for error requester QP state
  RDMA/rxe: Fix typo in comment

 drivers/infiniband/sw/rxe/rxe_comp.c  |  6 +++---
 drivers/infiniband/sw/rxe/rxe_qp.c    |  5 +++++
 drivers/infiniband/sw/rxe/rxe_req.c   | 16 +++++++++++++++-
 drivers/infiniband/sw/rxe/rxe_task.c  |  2 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h |  1 +
 5 files changed, 25 insertions(+), 5 deletions(-)

-- 
2.31.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v5 1/4] RDMA/rxe: Update wqe_index for each wqe error completion
  2022-07-04  6:00 [PATCH v5 0/4] RDMA/rxe: Fix no completion event issue lizhijian
@ 2022-07-04  6:00 ` lizhijian
  2022-07-04  6:00 ` [PATCH v5 2/4] RDMA/rxe: Generate error completion for error requester QP state lizhijian
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: lizhijian @ 2022-07-04  6:00 UTC (permalink / raw)
  To: Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, linux-rdma, Bob Pearson
  Cc: Cheng Xu, lizhijian

Previously, if user space keeps sending abnormal wqe, queue.index will
keep increasing while qp->req.wqe_index doesn't. Once
qp->req.wqe_index==queue.index in next round, req_next_wqe() will treat queue
as empty. In such case, no new completion would be generated.

Update wqe_index for each wqe completion so that req_next_wqe() can get
next wqe properly.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_req.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 9d98237389cf..4ffc4ebd6e28 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -759,6 +759,8 @@ int rxe_requester(void *arg)
 	if (ah)
 		rxe_put(ah);
 err:
+	/* update wqe_index for each wqe completion */
+	qp->req.wqe_index = queue_next_index(qp->sq.queue, qp->req.wqe_index);
 	wqe->state = wqe_state_error;
 	__rxe_do_task(&qp->comp.task);
 
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 2/4] RDMA/rxe: Generate error completion for error requester QP state
  2022-07-04  6:00 [PATCH v5 0/4] RDMA/rxe: Fix no completion event issue lizhijian
  2022-07-04  6:00 ` [PATCH v5 1/4] RDMA/rxe: Update wqe_index for each wqe error completion lizhijian
@ 2022-07-04  6:00 ` lizhijian
  2022-07-04  6:00 ` [PATCH v5 3/4] RDMA/rxe: Split qp state for requester and completer lizhijian
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: lizhijian @ 2022-07-04  6:00 UTC (permalink / raw)
  To: Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, linux-rdma, Bob Pearson
  Cc: Cheng Xu, lizhijian

As per IBTA specification, all subsequent WQEs while QP is in error
state should be completed with a flush error.

Here we check QP_STATE_ERROR after req_next_wqe() so that rxe_completer()
has chance to be called where it will set CQ state to FLUSH ERROR and the
completion can associate with its WQE.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V5: parentheses issue # Cheng Xu
V4: check QP ERROR before QP RESET # Bob
V3: unlikely() optimization # Cheng Xu <chengyou@linux.alibaba.com>
    update commit log # Haakon Bugge <haakon.bugge@oracle.com>
---
 drivers/infiniband/sw/rxe/rxe_req.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 4ffc4ebd6e28..6d2742997e1b 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -610,9 +610,20 @@ int rxe_requester(void *arg)
 		return -EAGAIN;
 
 next_wqe:
-	if (unlikely(!qp->valid || qp->req.state == QP_STATE_ERROR))
+	if (unlikely(!qp->valid))
 		goto exit;
 
+	if (unlikely(qp->req.state == QP_STATE_ERROR)) {
+		wqe = req_next_wqe(qp);
+		if (wqe)
+			/*
+			 * Generate an error completion for error qp state
+			 */
+			goto err;
+		else
+			goto exit;
+	}
+
 	if (unlikely(qp->req.state == QP_STATE_RESET)) {
 		qp->req.wqe_index = queue_get_consumer(q,
 						QUEUE_TYPE_FROM_CLIENT);
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 3/4] RDMA/rxe: Split qp state for requester and completer
  2022-07-04  6:00 [PATCH v5 0/4] RDMA/rxe: Fix no completion event issue lizhijian
  2022-07-04  6:00 ` [PATCH v5 1/4] RDMA/rxe: Update wqe_index for each wqe error completion lizhijian
  2022-07-04  6:00 ` [PATCH v5 2/4] RDMA/rxe: Generate error completion for error requester QP state lizhijian
@ 2022-07-04  6:00 ` lizhijian
  2022-07-04  6:00 ` [PATCH v5 4/4] RDMA/rxe: Fix typo in comment lizhijian
  2022-07-20  5:38 ` [PATCH v5 0/4] RDMA/rxe: Fix no completion event issue Leon Romanovsky
  4 siblings, 0 replies; 9+ messages in thread
From: lizhijian @ 2022-07-04  6:00 UTC (permalink / raw)
  To: Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, linux-rdma, Bob Pearson
  Cc: Cheng Xu

From: Bob Pearson <rpearsonhpe@gmail.com>

Currently the requester can continue to process send wqes after
an local qp operation error is detected because the setting of
the qp state to the error state is deferred until later. This
patch splits the qp state for the completer and requester into
two separate states and sets qp->req.state = QP_STATE_ERROR as
soon as the error is detected before another wqe can be executed.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
V4: new patch
---
 drivers/infiniband/sw/rxe/rxe_comp.c  | 6 +++---
 drivers/infiniband/sw/rxe/rxe_qp.c    | 5 +++++
 drivers/infiniband/sw/rxe/rxe_req.c   | 1 +
 drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
 4 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index da3a398053b8..0b68630a3e49 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -565,10 +565,10 @@ int rxe_completer(void *arg)
 	if (!rxe_get(qp))
 		return -EAGAIN;
 
-	if (!qp->valid || qp->req.state == QP_STATE_ERROR ||
-	    qp->req.state == QP_STATE_RESET) {
+	if (!qp->valid || qp->comp.state == QP_STATE_ERROR ||
+	    qp->comp.state == QP_STATE_RESET) {
 		rxe_drain_resp_pkts(qp, qp->valid &&
-				    qp->req.state == QP_STATE_ERROR);
+				    qp->comp.state == QP_STATE_ERROR);
 		ret = -EAGAIN;
 		goto done;
 	}
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 22e9b85344c3..a95d3b49ae20 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -230,6 +230,7 @@ static int rxe_qp_init_req(struct rxe_dev *rxe, struct rxe_qp *qp,
 					       QUEUE_TYPE_FROM_CLIENT);
 
 	qp->req.state		= QP_STATE_RESET;
+	qp->comp.state		= QP_STATE_RESET;
 	qp->req.opcode		= -1;
 	qp->comp.opcode		= -1;
 
@@ -490,6 +491,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
 
 	/* move qp to the reset state */
 	qp->req.state = QP_STATE_RESET;
+	qp->comp.state = QP_STATE_RESET;
 	qp->resp.state = QP_STATE_RESET;
 
 	/* let state machines reset themselves drain work and packet queues
@@ -552,6 +554,7 @@ void rxe_qp_error(struct rxe_qp *qp)
 {
 	qp->req.state = QP_STATE_ERROR;
 	qp->resp.state = QP_STATE_ERROR;
+	qp->comp.state = QP_STATE_ERROR;
 	qp->attr.qp_state = IB_QPS_ERR;
 
 	/* drain work and packet queues */
@@ -689,6 +692,7 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
 			pr_debug("qp#%d state -> INIT\n", qp_num(qp));
 			qp->req.state = QP_STATE_INIT;
 			qp->resp.state = QP_STATE_INIT;
+			qp->comp.state = QP_STATE_INIT;
 			break;
 
 		case IB_QPS_RTR:
@@ -699,6 +703,7 @@ int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask,
 		case IB_QPS_RTS:
 			pr_debug("qp#%d state -> RTS\n", qp_num(qp));
 			qp->req.state = QP_STATE_READY;
+			qp->comp.state = QP_STATE_READY;
 			break;
 
 		case IB_QPS_SQD:
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 6d2742997e1b..ad25290e393d 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -773,6 +773,7 @@ int rxe_requester(void *arg)
 	/* update wqe_index for each wqe completion */
 	qp->req.wqe_index = queue_next_index(qp->sq.queue, qp->req.wqe_index);
 	wqe->state = wqe_state_error;
+	qp->req.state = QP_STATE_ERROR;
 	__rxe_do_task(&qp->comp.task);
 
 exit:
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index ac464e68c923..bbfffe243fd6 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -129,6 +129,7 @@ struct rxe_req_info {
 };
 
 struct rxe_comp_info {
+	enum rxe_qp_state	state;
 	u32			psn;
 	int			opcode;
 	int			timeout;
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 4/4] RDMA/rxe: Fix typo in comment
  2022-07-04  6:00 [PATCH v5 0/4] RDMA/rxe: Fix no completion event issue lizhijian
                   ` (2 preceding siblings ...)
  2022-07-04  6:00 ` [PATCH v5 3/4] RDMA/rxe: Split qp state for requester and completer lizhijian
@ 2022-07-04  6:00 ` lizhijian
  2022-07-14 17:10   ` Bob Pearson
  2022-07-20  5:38 ` [PATCH v5 0/4] RDMA/rxe: Fix no completion event issue Leon Romanovsky
  4 siblings, 1 reply; 9+ messages in thread
From: lizhijian @ 2022-07-04  6:00 UTC (permalink / raw)
  To: Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, linux-rdma, Bob Pearson
  Cc: Cheng Xu, lizhijian

Fix a spelling mistake

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_task.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 0c4db5bb17d7..c9b80410cd5b 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -67,7 +67,7 @@ void rxe_do_task(struct tasklet_struct *t)
 				cont = 1;
 			break;
 
-		/* soneone tried to run the task since the last time we called
+		/* someone tried to run the task since the last time we called
 		 * func, so we will call one more time regardless of the
 		 * return value
 		 */
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 4/4] RDMA/rxe: Fix typo in comment
  2022-07-04  6:00 ` [PATCH v5 4/4] RDMA/rxe: Fix typo in comment lizhijian
@ 2022-07-14 17:10   ` Bob Pearson
  0 siblings, 0 replies; 9+ messages in thread
From: Bob Pearson @ 2022-07-14 17:10 UTC (permalink / raw)
  To: lizhijian, Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, linux-rdma; +Cc: Cheng Xu

On 7/4/22 01:00, lizhijian@fujitsu.com wrote:
> Fix a spelling mistake
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_task.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
> index 0c4db5bb17d7..c9b80410cd5b 100644
> --- a/drivers/infiniband/sw/rxe/rxe_task.c
> +++ b/drivers/infiniband/sw/rxe/rxe_task.c
> @@ -67,7 +67,7 @@ void rxe_do_task(struct tasklet_struct *t)
>  				cont = 1;
>  			break;
>  
> -		/* soneone tried to run the task since the last time we called
> +		/* someone tried to run the task since the last time we called
>  		 * func, so we will call one more time regardless of the
>  		 * return value
>  		 */

I think I snuck this in recently in something else but it is correct.

Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/4] RDMA/rxe: Fix no completion event issue
  2022-07-04  6:00 [PATCH v5 0/4] RDMA/rxe: Fix no completion event issue lizhijian
                   ` (3 preceding siblings ...)
  2022-07-04  6:00 ` [PATCH v5 4/4] RDMA/rxe: Fix typo in comment lizhijian
@ 2022-07-20  5:38 ` Leon Romanovsky
  2022-07-20  6:21   ` lizhijian
  4 siblings, 1 reply; 9+ messages in thread
From: Leon Romanovsky @ 2022-07-20  5:38 UTC (permalink / raw)
  To: lizhijian
  Cc: Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, linux-rdma,
	Bob Pearson, Cheng Xu

On Mon, Jul 04, 2022 at 06:00:54AM +0000, lizhijian@fujitsu.com wrote:

Please fix your gitconfig to have same From/author fields as in Signed-off-by.

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/4] RDMA/rxe: Fix no completion event issue
  2022-07-20  5:38 ` [PATCH v5 0/4] RDMA/rxe: Fix no completion event issue Leon Romanovsky
@ 2022-07-20  6:21   ` lizhijian
  2022-07-20  6:33     ` Leon Romanovsky
  0 siblings, 1 reply; 9+ messages in thread
From: lizhijian @ 2022-07-20  6:21 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, linux-rdma,
	Bob Pearson, Cheng Xu

Hi Leon


On 20/07/2022 13:38, Leon Romanovsky wrote:
> On Mon, Jul 04, 2022 at 06:00:54AM +0000, lizhijian@fujitsu.com wrote:
>
> Please fix your gitconfig to have same From/author fields as in Signed-off-by.

I'm sorry about that, tay I know which patch has something wrong? I have not updated these fields recently.
Do you mean "[PATCH v5 3/4] RDMA/rxe: Split qp state for requester and completer" which is from Bob. So
I keep his author and SOB.

Thanks
Zhijian


>
> Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/4] RDMA/rxe: Fix no completion event issue
  2022-07-20  6:21   ` lizhijian
@ 2022-07-20  6:33     ` Leon Romanovsky
  0 siblings, 0 replies; 9+ messages in thread
From: Leon Romanovsky @ 2022-07-20  6:33 UTC (permalink / raw)
  To: lizhijian
  Cc: Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, linux-rdma,
	Bob Pearson, Cheng Xu

On Wed, Jul 20, 2022 at 06:21:29AM +0000, lizhijian@fujitsu.com wrote:
> Hi Leon
> 
> 
> On 20/07/2022 13:38, Leon Romanovsky wrote:
> > On Mon, Jul 04, 2022 at 06:00:54AM +0000, lizhijian@fujitsu.com wrote:
> >
> > Please fix your gitconfig to have same From/author fields as in Signed-off-by.
> 
> I'm sorry about that, tay I know which patch has something wrong? I have not updated these fields recently.
> Do you mean "[PATCH v5 3/4] RDMA/rxe: Split qp state for requester and completer" which is from Bob. So
> I keep his author and SOB.

No, I'm talking about something else. Almost all your patches are sent
with wrong "From:" field.

Let's take your first patch as an example:
https://lore.kernel.org/linux-rdma/20220704060806.1622849-2-lizhijian@fujitsu.com/
From: "lizhijian@fujitsu.com" <lizhijian@fujitsu.com>
...
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

and if i try to apply it, the checkpatch will throw the following error:
➜  kernel git:(rdma-next) ✗ git am --continue
Applying: RDMA/rxe: Update wqe_index for each wqe error completion
➜  kernel git:(rdma-next) git checkpatch
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#9:
qp->req.wqe_index==queue.index in next round, req_next_wqe() will treat queue

WARNING: From:/Signed-off-by: email name mismatch: 'From: "lizhijian@fujitsu.com" <lizhijian@fujitsu.com>' != 'Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>'

0001-RDMA-rxe-Update-wqe_index-for-each-wqe-error-complet.patch total: 0 errors, 2 warnings, 8 lines checked


> 
> Thanks
> Zhijian
> 
> 
> >
> > Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-07-20  6:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-04  6:00 [PATCH v5 0/4] RDMA/rxe: Fix no completion event issue lizhijian
2022-07-04  6:00 ` [PATCH v5 1/4] RDMA/rxe: Update wqe_index for each wqe error completion lizhijian
2022-07-04  6:00 ` [PATCH v5 2/4] RDMA/rxe: Generate error completion for error requester QP state lizhijian
2022-07-04  6:00 ` [PATCH v5 3/4] RDMA/rxe: Split qp state for requester and completer lizhijian
2022-07-04  6:00 ` [PATCH v5 4/4] RDMA/rxe: Fix typo in comment lizhijian
2022-07-14 17:10   ` Bob Pearson
2022-07-20  5:38 ` [PATCH v5 0/4] RDMA/rxe: Fix no completion event issue Leon Romanovsky
2022-07-20  6:21   ` lizhijian
2022-07-20  6:33     ` Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.