All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] RDMA/rxe: Fix no completion event issue
@ 2022-05-11  2:30 Li Zhijian
  2022-05-11  2:30 ` [PATCH v2 1/2] RDMA/rxe: Update wqe_index for each wqe error completion Li Zhijian
  2022-05-11  2:30 ` [PATCH v2 2/2] RDMA/rxe: Generate error completion for error requester state Li Zhijian
  0 siblings, 2 replies; 6+ messages in thread
From: Li Zhijian @ 2022-05-11  2:30 UTC (permalink / raw)
  To: Zhu Yanjun, Jason Gunthorpe, linux-rdma; +Cc: linux-kernel, Li Zhijian

Since RXE always posts RDMA_WRITE successfully, it's observed that
no more completion occurs after a few incorrect posts. Actually, it
will block the polling. we can easily reproduce it by the below pattern.

a. post correct RDMA_WRITE
b. poll completion event
while true {
  c. post incorrect RDMA_WRITE(wrong rkey for example)
  d. poll completion event <<<< block after 2 incorrect RDMA_WRITE posts
}

Li Zhijian (2):
  RDMA/rxe: Update wqe_index for each wqe error completion
  RDMA/rxe: Generate error completion for error requester state

 drivers/infiniband/sw/rxe/rxe_req.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

-- 
2.31.1




^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] RDMA/rxe: Update wqe_index for each wqe error completion
  2022-05-11  2:30 [PATCH v2 0/2] RDMA/rxe: Fix no completion event issue Li Zhijian
@ 2022-05-11  2:30 ` Li Zhijian
  2022-05-11  2:30 ` [PATCH v2 2/2] RDMA/rxe: Generate error completion for error requester state Li Zhijian
  1 sibling, 0 replies; 6+ messages in thread
From: Li Zhijian @ 2022-05-11  2:30 UTC (permalink / raw)
  To: Zhu Yanjun, Jason Gunthorpe, linux-rdma; +Cc: linux-kernel, Li Zhijian

Previously, if user space keeps sending abnormal wqe, queue.prod will
keep increasing while queue.index doesn't. Once
queue.index==queue.prod in next round, req_next_wqe() will treat queue
as empty. In such case, no new completion would be generated.

Update wqe_index for each wqe completion so that req_next_wqe() can get
next wqe properly.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V2: Fix typos in commit logs

 drivers/infiniband/sw/rxe/rxe_req.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index a0d5e57f73c1..8bdd0b6b578f 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -773,6 +773,8 @@ int rxe_requester(void *arg)
 	if (ah)
 		rxe_put(ah);
 err:
+	/* update wqe_index for each wqe completion */
+	qp->req.wqe_index = queue_next_index(qp->sq.queue, qp->req.wqe_index);
 	wqe->state = wqe_state_error;
 	__rxe_do_task(&qp->comp.task);
 
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] RDMA/rxe: Generate error completion for error requester state
  2022-05-11  2:30 [PATCH v2 0/2] RDMA/rxe: Fix no completion event issue Li Zhijian
  2022-05-11  2:30 ` [PATCH v2 1/2] RDMA/rxe: Update wqe_index for each wqe error completion Li Zhijian
@ 2022-05-11  2:30 ` Li Zhijian
  2022-05-11  3:44   ` Cheng Xu
  1 sibling, 1 reply; 6+ messages in thread
From: Li Zhijian @ 2022-05-11  2:30 UTC (permalink / raw)
  To: Zhu Yanjun, Jason Gunthorpe, linux-rdma; +Cc: linux-kernel, Li Zhijian

SoftRoCE always returns success when user space is posting a new wqe where
it usually just enqueues a wqe.

Once the requester state becomes QP_STATE_ERROR, we should generate error
completion for all subsequent wqe. So the user is able to poll the
completion event to check if the former wqe is handled correctly.

Here we check QP_STATE_ERROR after req_next_wqe() so that the completion
can associate with its wqe.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_req.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 8bdd0b6b578f..ed6a486c4343 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -624,7 +624,7 @@ int rxe_requester(void *arg)
 	rxe_get(qp);
 
 next_wqe:
-	if (unlikely(!qp->valid || qp->req.state == QP_STATE_ERROR))
+	if (unlikely(!qp->valid))
 		goto exit;
 
 	if (unlikely(qp->req.state == QP_STATE_RESET)) {
@@ -646,6 +646,14 @@ int rxe_requester(void *arg)
 	if (unlikely(!wqe))
 		goto exit;
 
+	if (qp->req.state == QP_STATE_ERROR) {
+		/*
+		 * Generate an error completion so that user space is able to
+		 * poll this completion.
+		 */
+		goto err;
+	}
+
 	if (wqe->mask & WR_LOCAL_OP_MASK) {
 		ret = rxe_do_local_ops(qp, wqe);
 		if (unlikely(ret))
-- 
2.31.1




^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] RDMA/rxe: Generate error completion for error requester state
  2022-05-11  2:30 ` [PATCH v2 2/2] RDMA/rxe: Generate error completion for error requester state Li Zhijian
@ 2022-05-11  3:44   ` Cheng Xu
  2022-05-11  8:36     ` lizhijian
  0 siblings, 1 reply; 6+ messages in thread
From: Cheng Xu @ 2022-05-11  3:44 UTC (permalink / raw)
  To: Li Zhijian, Zhu Yanjun, Jason Gunthorpe, linux-rdma; +Cc: linux-kernel



On 5/11/22 10:30 AM, Li Zhijian wrote:
> SoftRoCE always returns success when user space is posting a new wqe where
> it usually just enqueues a wqe.
> 
> Once the requester state becomes QP_STATE_ERROR, we should generate error
> completion for all subsequent wqe. So the user is able to poll the
> completion event to check if the former wqe is handled correctly.
> 
> Here we check QP_STATE_ERROR after req_next_wqe() so that the completion
> can associate with its wqe.
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_req.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index 8bdd0b6b578f..ed6a486c4343 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -624,7 +624,7 @@ int rxe_requester(void *arg)
>   	rxe_get(qp);
>   
>   next_wqe:
> -	if (unlikely(!qp->valid || qp->req.state == QP_STATE_ERROR))
> +	if (unlikely(!qp->valid))
>   		goto exit;
>   
>   	if (unlikely(qp->req.state == QP_STATE_RESET)) {
> @@ -646,6 +646,14 @@ int rxe_requester(void *arg)
>   	if (unlikely(!wqe))
>   		goto exit;
>   
> +	if (qp->req.state == QP_STATE_ERROR) {
> +		/*
> +		 * Generate an error completion so that user space is able to
> +		 * poll this completion.
> +		 */
> +		goto err;
> +	}
> +

Should this still use unlikely(...) ? Because the original judgement has
a unlikely surrounded.

Cheng Xu

>   	if (wqe->mask & WR_LOCAL_OP_MASK) {
>   		ret = rxe_do_local_ops(qp, wqe);
>   		if (unlikely(ret))

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] RDMA/rxe: Generate error completion for error requester state
  2022-05-11  3:44   ` Cheng Xu
@ 2022-05-11  8:36     ` lizhijian
  2022-05-11 12:50       ` Haakon Bugge
  0 siblings, 1 reply; 6+ messages in thread
From: lizhijian @ 2022-05-11  8:36 UTC (permalink / raw)
  To: Cheng Xu, Zhu Yanjun, Jason Gunthorpe, linux-rdma; +Cc: linux-kernel



On 11/05/2022 11:44, Cheng Xu wrote:
>
>
> On 5/11/22 10:30 AM, Li Zhijian wrote:
>> SoftRoCE always returns success when user space is posting a new wqe where
>> it usually just enqueues a wqe.
>>
>> Once the requester state becomes QP_STATE_ERROR, we should generate error
>> completion for all subsequent wqe. So the user is able to poll the
>> completion event to check if the former wqe is handled correctly.
>>
>> Here we check QP_STATE_ERROR after req_next_wqe() so that the completion
>> can associate with its wqe.
>>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_req.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>> index 8bdd0b6b578f..ed6a486c4343 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>> @@ -624,7 +624,7 @@ int rxe_requester(void *arg)
>>       rxe_get(qp);
>>     next_wqe:
>> -    if (unlikely(!qp->valid || qp->req.state == QP_STATE_ERROR))
>> +    if (unlikely(!qp->valid))
>>           goto exit;
>>         if (unlikely(qp->req.state == QP_STATE_RESET)) {
>> @@ -646,6 +646,14 @@ int rxe_requester(void *arg)
>>       if (unlikely(!wqe))
>>           goto exit;
>>   +    if (qp->req.state == QP_STATE_ERROR) {
>> +        /*
>> +         * Generate an error completion so that user space is able to
>> +         * poll this completion.
>> +         */
>> +        goto err;
>> +    }
>> +
>
> Should this still use unlikely(...) ? Because the original judgement has
> a unlikely surrounded.

Good catch. it sounds good :)


Thanks
Zhijian



>
> Cheng Xu
>
>>       if (wqe->mask & WR_LOCAL_OP_MASK) {
>>           ret = rxe_do_local_ops(qp, wqe);
>>           if (unlikely(ret))

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] RDMA/rxe: Generate error completion for error requester state
  2022-05-11  8:36     ` lizhijian
@ 2022-05-11 12:50       ` Haakon Bugge
  0 siblings, 0 replies; 6+ messages in thread
From: Haakon Bugge @ 2022-05-11 12:50 UTC (permalink / raw)
  To: lizhijian
  Cc: Cheng Xu, Zhu Yanjun, Jason Gunthorpe, OFED mailing list, linux-kernel



> On 11 May 2022, at 10:36, lizhijian@fujitsu.com wrote:
> 
> 
> 
> On 11/05/2022 11:44, Cheng Xu wrote:
>> 
>> 
>> On 5/11/22 10:30 AM, Li Zhijian wrote:
>>> SoftRoCE always returns success when user space is posting a new wqe where
>>> it usually just enqueues a wqe.
>>> 
>>> Once the requester state becomes QP_STATE_ERROR, we should generate error
>>> completion for all subsequent wqe. So the user is able to poll the
>>> completion event to check if the former wqe is handled correctly.

This is not correct. You shall be able to post new send work requests. They shall be completed with FLUSHED_IN_ERROR. As per IBTA C10-42:

Work Requests subsequent to that which caused the Completion Error leading to the transition into the Error state, including those submitted after the transition, must return the Flush Error completion status through the Completion Queue.


Thxs, Håkon

>>> 
>>> Here we check QP_STATE_ERROR after req_next_wqe() so that the completion
>>> can associate with its wqe.
>>> 
>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>>> ---
>>>   drivers/infiniband/sw/rxe/rxe_req.c | 10 +++++++++-
>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>>> index 8bdd0b6b578f..ed6a486c4343 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>>> @@ -624,7 +624,7 @@ int rxe_requester(void *arg)
>>>       rxe_get(qp);
>>>     next_wqe:
>>> -    if (unlikely(!qp->valid || qp->req.state == QP_STATE_ERROR))
>>> +    if (unlikely(!qp->valid))
>>>           goto exit;
>>>         if (unlikely(qp->req.state == QP_STATE_RESET)) {
>>> @@ -646,6 +646,14 @@ int rxe_requester(void *arg)
>>>       if (unlikely(!wqe))
>>>           goto exit;
>>>   +    if (qp->req.state == QP_STATE_ERROR) {
>>> +        /*
>>> +         * Generate an error completion so that user space is able to
>>> +         * poll this completion.
>>> +         */
>>> +        goto err;
>>> +    }
>>> +
>> 
>> Should this still use unlikely(...) ? Because the original judgement has
>> a unlikely surrounded.
> 
> Good catch. it sounds good :)
> 
> 
> Thanks
> Zhijian
> 
> 
> 
>> 
>> Cheng Xu
>> 
>>>       if (wqe->mask & WR_LOCAL_OP_MASK) {
>>>           ret = rxe_do_local_ops(qp, wqe);
>>>           if (unlikely(ret))


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-05-11 12:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-11  2:30 [PATCH v2 0/2] RDMA/rxe: Fix no completion event issue Li Zhijian
2022-05-11  2:30 ` [PATCH v2 1/2] RDMA/rxe: Update wqe_index for each wqe error completion Li Zhijian
2022-05-11  2:30 ` [PATCH v2 2/2] RDMA/rxe: Generate error completion for error requester state Li Zhijian
2022-05-11  3:44   ` Cheng Xu
2022-05-11  8:36     ` lizhijian
2022-05-11 12:50       ` Haakon Bugge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.