Re: [SPDK] RDMA qp recovery: follow up on latest changes

* Re: [SPDK] RDMA qp recovery: follow up on latest changes
@ 2018-08-22 22:33 Walker, Benjamin
  0 siblings, 0 replies; 7+ messages in thread
From: Walker, Benjamin @ 2018-08-22 22:33 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2493 bytes --]

On Wed, 2018-08-22 at 22:04 +0000, Philipp Skadorov wrote:
> Hello Benjamin,
> There have been a series of changes to the QP error recovery that I'd like to
> follow up:
> 
> ** c3756ae3 nvmf: Eliminate spdk_nvmf_rdma_update_ibv_qp
> ibv_query_qp results in a write syscall to the SNIC driver which is a context
> switch.
> The original code was calling it when the state is known to change; the cached
> value was used everywhere else.

The observation driving this patch was that the cached value was always updated
before being used, so keeping the cached value wasn't saving any calls. Do you
see anywhere in the code on master where a call to ibv_query_qp could be
eliminated? The only one I see that might be questionable is the one in
_spdk_nvmf_rdma_qp_error.

> 
> ** 65a512c6 nvmf/rdma: Combine spdk_nvmf_rdma_qp_drained and
> spdk_nvmf_rdma_recover
> ** 3bec6601 nvmf/rdma: Simplify spdk_nvmf_rdma_qp_drained
> ** a9b9f0952d6a0c1a37e544ef2977e7db136a8e86 nvmf/rdma: Don't trigger error
> recovery on IBV_EVENT_SQ_DRAINED
> De-allocating the resources associated with the requests being processed by
> SNIC at the time of IBV_EVENT_QP_FATAL is too early.
> The IBV standard requires SPDK should "wait for the Affiliated Asynchronous
> Last WQE Reached Event" before manipulating with the QP state.
> Would also assume it is unsafe to return the associated requests and their
> data back to the pools before "Last WQE Reached" event is received.

I agree, but I think the code on master does that already with just one twist.
We observed that the IBV_EVENT_QP_LAST_WQE_REACHED event would never occur if
an error occured on an RDMA queue pair which had no outstanding I/O. So we can't
always wait for that event before releasing resources. Instead, when we first
are notified of the error via IBV_EVENT_QP_FATAL, we abort all outstanding
commands and then attempt to do the recovery. The recovery quits early if there
are RDMA operations outstanding though. In that case, we'll later get the
IBV_EVENT_QP_LAST_WQE_REACHED event and go through the same path, but not bail
out early. This was all mostly figured out through trial and error when force
disconnecting the NVMe-oF initiator. If you see any flaws in the logic let me
know so we can get them corrected.

> 
> 
> Thanks,
> Philipp
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 7+ messages in thread