All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bob Pearson <rpearsonhpe@gmail.com>
To: Tom Talpey <tom@talpey.com>,
	jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org
Subject: Re: [PATCH for-rc] RDMA/rxe: Fix rnr retry behavior
Date: Fri, 13 May 2022 10:33:00 -0500	[thread overview]
Message-ID: <d3ac03f3-f11f-59e1-5dec-d0670b214e72@gmail.com> (raw)
In-Reply-To: <ca817696-530e-f94f-dcfa-68f1980d31eb@talpey.com>

On 5/13/22 08:04, Tom Talpey wrote:
> 
> On 5/12/2022 3:49 PM, Bob Pearson wrote:
>> Currently the completer tasklet when it sets the retransmit timer or the
>> nak timer sets the same flag (qp->req.need_retry) so that if either
>> timer fires it will attempt to perform a retry flow on the send queue.
>> This has the effect of responding to an RNR NAK at the first retransmit
>> timer event which does not allow for the requested rnr timeout.
>>
>> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
>> prevents a retry flow until the rnr nak timer fires.
> 
> The new name is a little confusing, nobody "needs" an RNR timeout. :)
> But it seems really odd (and maybe fragile) to have two flags. To me,
> setting need_retry to "-1", or anything but 0 or 1, would be better.
> After all, if the RNR NAK timer goes off, the code will generally retry, right? So it seems more logical to merge these.
I am trying to cleanup pyverbs runs which sometimes fail with rnr retry errors.
In this specific case the rnr timeout value is set a lot longer than the retry timeout.
As discussed earlier the retry timer is never cleared so it continuously fires at about
40 times a second. The failing test is intentionally setting up a situation where the
receiver is not ready and then it is. The retry timeout usually acts as though it
were the rnr retry and 'fixes' the problem and the test passes. Since the retry timer
is just randomly firing it sometimes happens too soon and the receiver isn't ready yet
which leads to the failed test.

Logic says that if you receive an rnr nak the target is waiting for that specific send
packet to be resent and no other so we shouldn't be responding to a spurious retry
timeout.

You are correct that the retry sequence can be shared but it shouldn't start until the
rnr timer has fired once an rnr nak is seen. This patch does exactly that by blocking
the retry sequence until the rnr timer fires.

If you don't like 'need_rnr_timeout' perhaps you would accept 'wait_rnr_timeout' instead.
Overloading need_retry is less clear IMHO. You don't need the retry until the rnr timer
has fired then you do. If you really want to just use one variable we are basically
implementing a state machine with 3 states and it should get an enum to define the states.

Once this issue is resolved I will fix the spurious retries to make them a lot less likely.
> 
>> This patch fixes rnr retry errors which can be observed by running the
>> pyverbs test suite 50-100X. With this patch applied they do not occur.
>>
>> Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
>> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_comp.c  | 4 +---
>>   drivers/infiniband/sw/rxe/rxe_qp.c    | 1 +
>>   drivers/infiniband/sw/rxe/rxe_req.c   | 6 ++++--
>>   drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>>   4 files changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
>> index 138b3e7d3a5f..bc668cb211b1 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
>> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
>>                   if (qp->comp.rnr_retry != 7)
>>                       qp->comp.rnr_retry--;
>>   -                qp->req.need_retry = 1;
>> -                pr_debug("qp#%d set rnr nak timer\n",
>> -                     qp_num(qp));
>> +                qp->req.need_rnr_timeout = 1;
> 
> Suggest req.need_rnr_retry = -1  (and keep the useful pr_debug!)
> 
>>                   mod_timer(&qp->rnr_nak_timer,
>>                         jiffies + rnrnak_jiffies(aeth_syn(pkt)
>>                           & ~AETH_TYPE_MASK));
>> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
>> index 62acf890af6c..1c962468714e 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
>> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
>>       atomic_set(&qp->ssn, 0);
>>       qp->req.opcode = -1;
>>       qp->req.need_retry = 0;
>> +    qp->req.need_rnr_timeout = 0;
>>       qp->req.noack_pkts = 0;
>>       qp->resp.msn = 0;
>>       qp->resp.opcode = -1;
>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>> index ae5fbc79dd5c..770ae4279f73 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
>>   {
>>       struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>>   -    pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
>> +    qp->req.need_retry = 1;
>> +    qp->req.need_rnr_timeout = 0;
> 
> Simply setting need_retry = 1 would suffice, if suggestion accepted.
> 
>>       rxe_run_task(&qp->req.task, 1);
>>   }
>>   @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
>>           qp->req.need_rd_atomic = 0;
>>           qp->req.wait_psn = 0;
>>           qp->req.need_retry = 0;
>> +        qp->req.need_rnr_timeout = 0;
>>           goto exit;
>>       }
>>   -    if (unlikely(qp->req.need_retry)) {
>> +    if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
> 
> This would become (unlikely (qp->req.rnr_retry > 0)) ...
> 
>>           req_retry(qp);
>>           qp->req.need_retry = 0;
>>       }
>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
>> index e7eff1ca75e9..ab3186478c3f 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
>> @@ -123,6 +123,7 @@ struct rxe_req_info {
>>       int            need_rd_atomic;
>>       int            wait_psn;
>>       int            need_retry;
>> +    int            need_rnr_timeout;
> 
> Drop
> 
>>       int            noack_pkts;
>>       struct rxe_task        task;
>>   };
>>
>> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
> 
> 
> Tom.


  reply	other threads:[~2022-05-13 15:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-12 19:49 [PATCH for-rc] RDMA/rxe: Fix rnr retry behavior Bob Pearson
2022-05-13  2:40 ` Yanjun Zhu
2022-05-13 13:38   ` Yanjun Zhu
2022-05-13 15:42     ` Bob Pearson
2022-05-13 18:22     ` Bob Pearson
2022-05-14  2:30       ` Yanjun Zhu
2022-05-14  3:25         ` Bob Pearson
2022-05-13 13:04 ` Tom Talpey
2022-05-13 15:33   ` Bob Pearson [this message]
2022-05-13 17:43     ` Tom Talpey
2022-05-13 18:32       ` Bob Pearson
2022-05-13 19:08         ` Tom Talpey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d3ac03f3-f11f-59e1-5dec-d0670b214e72@gmail.com \
    --to=rpearsonhpe@gmail.com \
    --cc=jgg@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=tom@talpey.com \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.