All of lore.kernel.org
 help / color / mirror / Atom feed
* RDMA/rxe: [BUG-REPORT] Incorrect rnr retry behavior
@ 2022-05-07 13:52 Bob Pearson
  0 siblings, 0 replies; only message in thread
From: Bob Pearson @ 2022-05-07 13:52 UTC (permalink / raw)
  To: Zhu Yanjun, Jason Gunthorpe, linux-rdma

(Not related to blktests at all)

When running the python test suite repeatedly (~50-100X) I occasionally see RNR retry failures.
With some tracing it turns out that the rxe driver *never* waits for the rnr_nak_timer to expire
before retrying the send queue. Something else is triggering the requester tasklet to re-run and
retry the send queue much too early so not enough time is allowed. This can be fixed by adding
a flag to qp->req indicating that the requester should wait for the rnr_nak_timer to fire
before running again. Most of the time it still works with the reduced timeout but occasionally
it fails. This will cause intermittent rnr_nak failures.

Bob

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-05-07 13:52 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-07 13:52 RDMA/rxe: [BUG-REPORT] Incorrect rnr retry behavior Bob Pearson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.