linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RDMA/rxe: [BUG-REPORT] Incorrect rnr retry behavior
@ 2022-05-07 13:52 Bob Pearson
  0 siblings, 0 replies; only message in thread
From: Bob Pearson @ 2022-05-07 13:52 UTC (permalink / raw)
  To: Zhu Yanjun, Jason Gunthorpe, linux-rdma

(Not related to blktests at all)

When running the python test suite repeatedly (~50-100X) I occasionally see RNR retry failures.
With some tracing it turns out that the rxe driver *never* waits for the rnr_nak_timer to expire
before retrying the send queue. Something else is triggering the requester tasklet to re-run and
retry the send queue much too early so not enough time is allowed. This can be fixed by adding
a flag to qp->req indicating that the requester should wait for the rnr_nak_timer to fire
before running again. Most of the time it still works with the reduced timeout but occasionally
it fails. This will cause intermittent rnr_nak failures.

Bob

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-05-07 13:52 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-07 13:52 RDMA/rxe: [BUG-REPORT] Incorrect rnr retry behavior Bob Pearson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).