linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue
@ 2022-05-16  1:53 Li Zhijian
  2022-05-16  1:53 ` [PATCH v3 1/2] RDMA/rxe: Update wqe_index for each wqe error completion Li Zhijian
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Li Zhijian @ 2022-05-16  1:53 UTC (permalink / raw)
  To: Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, Cheng Xu, linux-rdma
  Cc: linux-kernel, Li Zhijian

Since RXE always posts RDMA_WRITE successfully, it's observed that
no more completion occurs after a few incorrect posts. Actually, it
will block the polling. we can easily reproduce it by the below pattern.

a. post correct RDMA_WRITE
b. poll completion event
while true {
  c. post incorrect RDMA_WRITE(wrong rkey for example)
  d. poll completion event <<<< block after 2 incorrect RDMA_WRITE posts
}


Li Zhijian (2):
  RDMA/rxe: Update wqe_index for each wqe error completion
  RDMA/rxe: Generate error completion for error requester QP state

 drivers/infiniband/sw/rxe/rxe_req.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

-- 
2.31.1




^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/2] RDMA/rxe: Update wqe_index for each wqe error completion
  2022-05-16  1:53 [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue Li Zhijian
@ 2022-05-16  1:53 ` Li Zhijian
  2022-06-26 21:51   ` Bob Pearson
  2022-05-16  1:53 ` [PATCH v3 2/2] RDMA/rxe: Generate error completion for error requester QP state Li Zhijian
  2022-06-07  8:32 ` [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue lizhijian
  2 siblings, 1 reply; 12+ messages in thread
From: Li Zhijian @ 2022-05-16  1:53 UTC (permalink / raw)
  To: Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, Cheng Xu, linux-rdma
  Cc: linux-kernel, Li Zhijian

Previously, if user space keeps sending abnormal wqe, queue.prod will
keep increasing while queue.index doesn't. Once
queue.index==queue.prod in next round, req_next_wqe() will treat queue
as empty. In such case, no new completion would be generated.

Update wqe_index for each wqe completion so that req_next_wqe() can get
next wqe properly.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_req.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index a0d5e57f73c1..8bdd0b6b578f 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -773,6 +773,8 @@ int rxe_requester(void *arg)
 	if (ah)
 		rxe_put(ah);
 err:
+	/* update wqe_index for each wqe completion */
+	qp->req.wqe_index = queue_next_index(qp->sq.queue, qp->req.wqe_index);
 	wqe->state = wqe_state_error;
 	__rxe_do_task(&qp->comp.task);
 
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/2] RDMA/rxe: Generate error completion for error requester QP state
  2022-05-16  1:53 [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue Li Zhijian
  2022-05-16  1:53 ` [PATCH v3 1/2] RDMA/rxe: Update wqe_index for each wqe error completion Li Zhijian
@ 2022-05-16  1:53 ` Li Zhijian
  2022-06-26 22:42   ` Bob Pearson
  2022-06-07  8:32 ` [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue lizhijian
  2 siblings, 1 reply; 12+ messages in thread
From: Li Zhijian @ 2022-05-16  1:53 UTC (permalink / raw)
  To: Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, Cheng Xu, linux-rdma
  Cc: linux-kernel, Li Zhijian

As per IBTA specification, all subsequent WQEs while QP is in error
state should be completed with a flush error.

Here we check QP_STATE_ERROR after req_next_wqe() so that rxe_completer()
has chance to be called where it will set CQ state to FLUSH ERROR and the
completion can associate with its WQE.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V3: unlikely() optimization # Cheng Xu <chengyou@linux.alibaba.com>
    update commit log # Haakon Bugge <haakon.bugge@oracle.com>
---
 drivers/infiniband/sw/rxe/rxe_req.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 8bdd0b6b578f..c1f1c19f26b2 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -624,7 +624,7 @@ int rxe_requester(void *arg)
 	rxe_get(qp);
 
 next_wqe:
-	if (unlikely(!qp->valid || qp->req.state == QP_STATE_ERROR))
+	if (unlikely(!qp->valid))
 		goto exit;
 
 	if (unlikely(qp->req.state == QP_STATE_RESET)) {
@@ -646,6 +646,14 @@ int rxe_requester(void *arg)
 	if (unlikely(!wqe))
 		goto exit;
 
+	if (unlikely(qp->req.state == QP_STATE_ERROR)) {
+		/*
+		 * Generate an error completion so that user space is able to
+		 * poll this completion.
+		 */
+		goto err;
+	}
+
 	if (wqe->mask & WR_LOCAL_OP_MASK) {
 		ret = rxe_do_local_ops(qp, wqe);
 		if (unlikely(ret))
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue
  2022-05-16  1:53 [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue Li Zhijian
  2022-05-16  1:53 ` [PATCH v3 1/2] RDMA/rxe: Update wqe_index for each wqe error completion Li Zhijian
  2022-05-16  1:53 ` [PATCH v3 2/2] RDMA/rxe: Generate error completion for error requester QP state Li Zhijian
@ 2022-06-07  8:32 ` lizhijian
  2022-06-24 23:39   ` Jason Gunthorpe
  2022-06-25 12:59   ` Yanjun Zhu
  2 siblings, 2 replies; 12+ messages in thread
From: lizhijian @ 2022-06-07  8:32 UTC (permalink / raw)
  To: Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, Cheng Xu, linux-rdma
  Cc: linux-kernel

Hi Json & Yanjun


I know there are still a few regressions on RXE, but i do wish you could take some time to review these *simple and bugfix* patches
They are not related to the regressions.


Thanks
Zhijian


On 16/05/2022 09:53, Li Zhijian wrote:
> Since RXE always posts RDMA_WRITE successfully, it's observed that
> no more completion occurs after a few incorrect posts. Actually, it
> will block the polling. we can easily reproduce it by the below pattern.
>
> a. post correct RDMA_WRITE
> b. poll completion event
> while true {
>    c. post incorrect RDMA_WRITE(wrong rkey for example)
>    d. poll completion event <<<< block after 2 incorrect RDMA_WRITE posts
> }
>
>
> Li Zhijian (2):
>    RDMA/rxe: Update wqe_index for each wqe error completion
>    RDMA/rxe: Generate error completion for error requester QP state
>
>   drivers/infiniband/sw/rxe/rxe_req.c | 12 +++++++++++-
>   1 file changed, 11 insertions(+), 1 deletion(-)
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue
  2022-06-07  8:32 ` [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue lizhijian
@ 2022-06-24 23:39   ` Jason Gunthorpe
  2022-06-25  7:47     ` Li, Zhijian
  2022-06-25 12:59   ` Yanjun Zhu
  1 sibling, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2022-06-24 23:39 UTC (permalink / raw)
  To: lizhijian; +Cc: Yanjun Zhu, Haakon Bugge, Cheng Xu, linux-rdma, linux-kernel

On Tue, Jun 07, 2022 at 08:32:40AM +0000, lizhijian@fujitsu.com wrote:
> Hi Json & Yanjun
> 
> 
> I know there are still a few regressions on RXE, but i do wish you
> could take some time to review these *simple and bugfix* patches
> They are not related to the regressions.

I would like someone familiar with rxe to ack the datapath changes - I
have a very limited knowledge about rxe.

If that is not forthcoming from others in the rxe community then I
will accept confirmation directly from you that the pyverbs tests and
the blktests scenarios have been run and pass for your changes.

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue
  2022-06-24 23:39   ` Jason Gunthorpe
@ 2022-06-25  7:47     ` Li, Zhijian
  0 siblings, 0 replies; 12+ messages in thread
From: Li, Zhijian @ 2022-06-25  7:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yanjun Zhu, Haakon Bugge, Cheng Xu, linux-rdma, linux-kernel


on 6/25/2022 7:39 AM, Jason Gunthorpe wrote:
> On Tue, Jun 07, 2022 at 08:32:40AM +0000, lizhijian@fujitsu.com wrote:
>> Hi Json & Yanjun
>>
>>
>> I know there are still a few regressions on RXE, but i do wish you
>> could take some time to review these *simple and bugfix* patches
>> They are not related to the regressions.
> I would like someone familiar with rxe to ack the datapath changes

Thanks for your feedback

Haakon Bugge  had reviewed the datapath changes except the commit log in 
the V1 patches privately for some reasons weeks ago.

Hey Haakon, could you help to review these patches.


>   - I have a very limited knowledge about rxe.
>
> If that is not forthcoming from others in the rxe community then I
> will accept confirmation directly from you that the pyverbs tests and
> the blktests scenarios have been run and pass for your changes.

it's confirmed that pyverbs tests and nvme group with RXE of blktests 
have no regression after these changes

Thanks

Zhijian

>
> Jason



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue
  2022-06-07  8:32 ` [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue lizhijian
  2022-06-24 23:39   ` Jason Gunthorpe
@ 2022-06-25 12:59   ` Yanjun Zhu
  2022-06-26  3:29     ` Li, Zhijian
  1 sibling, 1 reply; 12+ messages in thread
From: Yanjun Zhu @ 2022-06-25 12:59 UTC (permalink / raw)
  To: lizhijian, Jason Gunthorpe, Haakon Bugge, Cheng Xu, linux-rdma
  Cc: linux-kernel


在 2022/6/7 16:32, lizhijian@fujitsu.com 写道:
> Hi Json & Yanjun
>
>
> I know there are still a few regressions on RXE, but i do wish you could take some time to review these *simple and bugfix* patches
> They are not related to the regressions.

Now there are some problems from Redhat and other Linux Vendors.

We had better focus on these problems.

Zhu Yanjun

>
>
> Thanks
> Zhijian
>
>
> On 16/05/2022 09:53, Li Zhijian wrote:
>> Since RXE always posts RDMA_WRITE successfully, it's observed that
>> no more completion occurs after a few incorrect posts. Actually, it
>> will block the polling. we can easily reproduce it by the below pattern.
>>
>> a. post correct RDMA_WRITE
>> b. poll completion event
>> while true {
>>     c. post incorrect RDMA_WRITE(wrong rkey for example)
>>     d. poll completion event <<<< block after 2 incorrect RDMA_WRITE posts
>> }
>>
>>
>> Li Zhijian (2):
>>     RDMA/rxe: Update wqe_index for each wqe error completion
>>     RDMA/rxe: Generate error completion for error requester QP state
>>
>>    drivers/infiniband/sw/rxe/rxe_req.c | 12 +++++++++++-
>>    1 file changed, 11 insertions(+), 1 deletion(-)
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue
  2022-06-25 12:59   ` Yanjun Zhu
@ 2022-06-26  3:29     ` Li, Zhijian
  2022-06-26 10:38       ` yangx.jy
  0 siblings, 1 reply; 12+ messages in thread
From: Li, Zhijian @ 2022-06-26  3:29 UTC (permalink / raw)
  To: Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, Cheng Xu, linux-rdma,
	Yang, Xiao
  Cc: linux-kernel


on 6/25/2022 8:59 PM, Yanjun Zhu wrote:
>
> 在 2022/6/7 16:32, lizhijian@fujitsu.com 写道:
>> Hi Json & Yanjun
>>
>>
>> I know there are still a few regressions on RXE, but i do wish you 
>> could take some time to review these *simple and bugfix* patches
>> They are not related to the regressions.
>
> Now there are some problems from Redhat and other Linux Vendors.
>
> We had better focus on these problems.

+ Xiao
I do believe regression is high priority,  and I'm very willing to contribute our efforts to improve the stability of RXE :)
Yang,Xiao and me tried to reproduce the issues in maillist and we also tried to review the their corresponding patches.
However actually we didn't find a unified way something like bugzilla to maintain the issues and their status, and most of
them are not reproduced by our local environment. So it's a bit hard for us to review/verify the patches especially for the
large/complicate patch if we don't have the use cases.

BTW, IMO we shouldn't stop reviewing other fixes expect recent regressions.

Zhijian

>
> Zhu Yanjun
>
>>
>>
>> Thanks
>> Zhijian
>>
>>
>> On 16/05/2022 09:53, Li Zhijian wrote:
>>> Since RXE always posts RDMA_WRITE successfully, it's observed that
>>> no more completion occurs after a few incorrect posts. Actually, it
>>> will block the polling. we can easily reproduce it by the below 
>>> pattern.
>>>
>>> a. post correct RDMA_WRITE
>>> b. poll completion event
>>> while true {
>>>     c. post incorrect RDMA_WRITE(wrong rkey for example)
>>>     d. poll completion event <<<< block after 2 incorrect RDMA_WRITE 
>>> posts
>>> }
>>>
>>>
>>> Li Zhijian (2):
>>>     RDMA/rxe: Update wqe_index for each wqe error completion
>>>     RDMA/rxe: Generate error completion for error requester QP state
>>>
>>>    drivers/infiniband/sw/rxe/rxe_req.c | 12 +++++++++++-
>>>    1 file changed, 11 insertions(+), 1 deletion(-)
>>>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue
  2022-06-26  3:29     ` Li, Zhijian
@ 2022-06-26 10:38       ` yangx.jy
  0 siblings, 0 replies; 12+ messages in thread
From: yangx.jy @ 2022-06-26 10:38 UTC (permalink / raw)
  To: lizhijian, Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, Cheng Xu,
	linux-rdma
  Cc: linux-kernel

On 2022/6/26 11:29, Li, Zhijian wrote:
> + Xiao
> I do believe regression is high priority,  and I'm very willing to 
> contribute our efforts to improve the stability of RXE :)
> Yang,Xiao and me tried to reproduce the issues in maillist and we also 
> tried to review the their corresponding patches.
> However actually we didn't find a unified way something like bugzilla to 
> maintain the issues and their status, and most of
> them are not reproduced by our local environment. So it's a bit hard for 
> us to review/verify the patches especially for the
> large/complicate patch if we don't have the use cases.
> 
> BTW, IMO we shouldn't stop reviewing other fixes expect recent regressions.

Agreed.

Besides, this patch set looks good to me.
Reviewed-by: Xiao Yang <yangx.jy@fujitsu.com>

Best Regards,
Xiao Yang
> 
> Zhijian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] RDMA/rxe: Update wqe_index for each wqe error completion
  2022-05-16  1:53 ` [PATCH v3 1/2] RDMA/rxe: Update wqe_index for each wqe error completion Li Zhijian
@ 2022-06-26 21:51   ` Bob Pearson
  2022-06-27  3:41     ` lizhijian
  0 siblings, 1 reply; 12+ messages in thread
From: Bob Pearson @ 2022-06-26 21:51 UTC (permalink / raw)
  To: Li Zhijian, Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, Cheng Xu,
	linux-rdma
  Cc: linux-kernel

On 5/15/22 20:53, Li Zhijian wrote:
> Previously, if user space keeps sending abnormal wqe, queue.prod will
> keep increasing while queue.index doesn't. Once
> queue.index==queue.prod in next round, req_next_wqe() will treat queue
> as empty. In such case, no new completion would be generated.
> 
> Update wqe_index for each wqe completion so that req_next_wqe() can get
> next wqe properly.
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_req.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index a0d5e57f73c1..8bdd0b6b578f 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -773,6 +773,8 @@ int rxe_requester(void *arg)
>  	if (ah)
>  		rxe_put(ah);
>  err:
> +	/* update wqe_index for each wqe completion */
> +	qp->req.wqe_index = queue_next_index(qp->sq.queue, qp->req.wqe_index);
>  	wqe->state = wqe_state_err
>  	__rxe_do_task(&qp->comp.task);
>  

This change looks plausible, but I am not sure if it will make a difference since the qp
will get transitioned to the error state very shortly.

In order for it to matter the requester must be a ways ahead of the completer in the send queue
and someone be actively posting new wqes which will reschedule the requester. Currently it
will fail on the same wqe again unless the error described above occurs but if we post a new valid
wqe it will get executed even though we have detected an error that should have stopped the qp.

It looks like the intent was to keep the qp in the non error state until all the old
wqes get completed before making the transition. But we should disable the requester
from processing new wqes in this case. That seems like a safer solution to the problem.

Bob


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/2] RDMA/rxe: Generate error completion for error requester QP state
  2022-05-16  1:53 ` [PATCH v3 2/2] RDMA/rxe: Generate error completion for error requester QP state Li Zhijian
@ 2022-06-26 22:42   ` Bob Pearson
  0 siblings, 0 replies; 12+ messages in thread
From: Bob Pearson @ 2022-06-26 22:42 UTC (permalink / raw)
  To: Li Zhijian, Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, Cheng Xu,
	linux-rdma
  Cc: linux-kernel

On 5/15/22 20:53, Li Zhijian wrote:
> As per IBTA specification, all subsequent WQEs while QP is in error
> state should be completed with a flush error.
> 
> Here we check QP_STATE_ERROR after req_next_wqe() so that rxe_completer()
> has chance to be called where it will set CQ state to FLUSH ERROR and the
> completion can associate with its WQE.
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
> V3: unlikely() optimization # Cheng Xu <chengyou@linux.alibaba.com>
>     update commit log # Haakon Bugge <haakon.bugge@oracle.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_req.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index 8bdd0b6b578f..c1f1c19f26b2 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -624,7 +624,7 @@ int rxe_requester(void *arg)
>  	rxe_get(qp);
>  
>  next_wqe:
> -	if (unlikely(!qp->valid || qp->req.state == QP_STATE_ERROR))
> +	if (unlikely(!qp->valid))
>  		goto exit;
>  
>  	if (unlikely(qp->req.state == QP_STATE_RESET)) {
> @@ -646,6 +646,14 @@ int rxe_requester(void *arg)
>  	if (unlikely(!wqe))
>  		goto exit;
>  
> +	if (unlikely(qp->req.state == QP_STATE_ERROR)) {
> +		/*
> +		 * Generate an error completion so that user space is able to
> +		 * poll this completion.
> +		 */
> +		goto err;
> +	}
> +
>  	if (wqe->mask & WR_LOCAL_OP_MASK) {
>  		ret = rxe_do_local_ops(qp, wqe);
>  		if (unlikely(ret))

There may be issues with moving this after the retry check since the retransmit timer can
fire at any time and may race with the completer setting the error state and result in
a retry flow occurring while you are trying to flush out all the wqes. Perhaps better
to to duplicate setting wqe in the error state check.

Bob

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] RDMA/rxe: Update wqe_index for each wqe error completion
  2022-06-26 21:51   ` Bob Pearson
@ 2022-06-27  3:41     ` lizhijian
  0 siblings, 0 replies; 12+ messages in thread
From: lizhijian @ 2022-06-27  3:41 UTC (permalink / raw)
  To: Bob Pearson, Yanjun Zhu, Jason Gunthorpe, Haakon Bugge, Cheng Xu,
	linux-rdma
  Cc: linux-kernel



On 27/06/2022 05:51, Bob Pearson wrote:
> On 5/15/22 20:53, Li Zhijian wrote:
>> Previously, if user space keeps sending abnormal wqe, queue.prod will
>> keep increasing while queue.index doesn't. Once
>> queue.index==queue.prod in next round, req_next_wqe() will treat queue
>> as empty. In such case, no new completion would be generated.
>>
>> Update wqe_index for each wqe completion so that req_next_wqe() can get
>> next wqe properly.
>>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_req.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>> index a0d5e57f73c1..8bdd0b6b578f 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>> @@ -773,6 +773,8 @@ int rxe_requester(void *arg)
>>   	if (ah)
>>   		rxe_put(ah);
>>   err:
>> +	/* update wqe_index for each wqe completion */
>> +	qp->req.wqe_index = queue_next_index(qp->sq.queue, qp->req.wqe_index);
>>   	wqe->state = wqe_state_err
>>   	__rxe_do_task(&qp->comp.task);
>>   
> This change looks plausible, but I am not sure if it will make a difference since the qp
> will get transitioned to the error state very shortly.
>
> In order for it to matter the requester must be a ways ahead of the completer in the send queue
> and someone be actively posting new wqes which will reschedule the requester. Currently it
> will fail on the same wqe again unless the error described above occurs but if we post a new valid
> wqe it will get executed even though we have detected an error that should have stopped the qp.
>
> It looks like the intent was to keep the qp in the non error state until all the old
> wqes get completed before making the transition.
Not really, My first intent was just let req_next_wqe() return wqe if the queue is not empty.
Since, currently if  rxe_requester() always goes to the error path for some reasons, req_next_wqe()
will becomes false empty at next round though the queue is almost full.

BTW, i will review your newly private patches

Thanks
Zhijian

> But we should disable the requester
> from processing new wqes in this case. That seems like a safer solution to the problem.
>
> Bob
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-06-27  3:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-16  1:53 [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue Li Zhijian
2022-05-16  1:53 ` [PATCH v3 1/2] RDMA/rxe: Update wqe_index for each wqe error completion Li Zhijian
2022-06-26 21:51   ` Bob Pearson
2022-06-27  3:41     ` lizhijian
2022-05-16  1:53 ` [PATCH v3 2/2] RDMA/rxe: Generate error completion for error requester QP state Li Zhijian
2022-06-26 22:42   ` Bob Pearson
2022-06-07  8:32 ` [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue lizhijian
2022-06-24 23:39   ` Jason Gunthorpe
2022-06-25  7:47     ` Li, Zhijian
2022-06-25 12:59   ` Yanjun Zhu
2022-06-26  3:29     ` Li, Zhijian
2022-06-26 10:38       ` yangx.jy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).