linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-next 0/2] Fix crash due to sleepy mutex while holding lock in post_{send|recv|poll}
@ 2019-10-28  9:45 Yixian Liu
  2019-10-28  9:45 ` [PATCH for-next 1/2] RDMA/hns: Add the workqueue framework for flush cqe handler Yixian Liu
  2019-10-28  9:45 ` [PATCH for-next 2/2] RDMA/hns: Delayed flush cqe process with workqueue Yixian Liu
  0 siblings, 2 replies; 7+ messages in thread
From: Yixian Liu @ 2019-10-28  9:45 UTC (permalink / raw)
  To: dledford, jgg, leon; +Cc: linux-rdma, linuxarm

Earlier Background:
HiP08 RoCE hardware lacks ability(a known hardware problem) to flush
outstanding WQEs if QP state gets into errored mode for some reason.
To overcome this hardware problem and as a workaround, when QP is
detected to be in errored state during various legs like post send,
post receive etc [1], flush needs to be performed from the driver.

These data-path legs might get called concurrently from various context,
like thread and interrupt as well (like NVMe driver). Hence, these need
to be protected with spin-locks for the concurrency. This code exists
within the driver.

Problem:
Earlier The patch[1] sent to solve the hardware limitation explained
in the background section had a bug in the software flushing leg. It
acquired mutex while modifying QP state to errored state and while
conveying it to the hardware using the mailbox. This caused leg to
sleep while holding spin-lock and caused crash.

Suggested Solution:
In this patch, we have proposed to defer the flushing of the QP in
Errored state using the workqueue.

We do understand that this might have an impact on the recovery times
as scheduling of the wqorkqueue handler depends upon the occupancy of
the system. Therefore to roughly mitigate this affect we have tried
to use Concurrency Managed workqueue to give worker thread (and
hence handler) a chance to run over more than one core.


[1] https://patchwork.kernel.org/patch/10534271/


This patch-set consists of:
[Patch 001] Introduce workqueue based WQE Flush Handler
[Patch 002] Call WQE flush handler in post {send|receive|poll}

Yixian Liu (2):
  RDMA/hns: Add the workqueue framework for flush cqe handler
  RDMA/hns: Delayed flush cqe process with workqueue

 drivers/infiniband/hw/hns/hns_roce_device.h |  10 +++
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 100 +++++++++++++++-------------
 drivers/infiniband/hw/hns/hns_roce_qp.c     |  43 ++++++++++++
 3 files changed, 107 insertions(+), 46 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-11-09 10:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-28  9:45 [PATCH for-next 0/2] Fix crash due to sleepy mutex while holding lock in post_{send|recv|poll} Yixian Liu
2019-10-28  9:45 ` [PATCH for-next 1/2] RDMA/hns: Add the workqueue framework for flush cqe handler Yixian Liu
2019-11-06 20:40   ` Jason Gunthorpe
2019-11-07 12:48     ` Liuyixian (Eason)
2019-11-07 18:28       ` Jason Gunthorpe
2019-11-09 10:30         ` Liuyixian (Eason)
2019-10-28  9:45 ` [PATCH for-next 2/2] RDMA/hns: Delayed flush cqe process with workqueue Yixian Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).