All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [SPDK] RDMA qp recovery: follow up on latest changes
@ 2018-08-22 22:33 Walker, Benjamin
  0 siblings, 0 replies; 7+ messages in thread
From: Walker, Benjamin @ 2018-08-22 22:33 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2493 bytes --]

On Wed, 2018-08-22 at 22:04 +0000, Philipp Skadorov wrote:
> Hello Benjamin,
> There have been a series of changes to the QP error recovery that I'd like to
> follow up:
> 
> ** c3756ae3 nvmf: Eliminate spdk_nvmf_rdma_update_ibv_qp
> ibv_query_qp results in a write syscall to the SNIC driver which is a context
> switch.
> The original code was calling it when the state is known to change; the cached
> value was used everywhere else.

The observation driving this patch was that the cached value was always updated
before being used, so keeping the cached value wasn't saving any calls. Do you
see anywhere in the code on master where a call to ibv_query_qp could be
eliminated? The only one I see that might be questionable is the one in
_spdk_nvmf_rdma_qp_error.

> 
> ** 65a512c6 nvmf/rdma: Combine spdk_nvmf_rdma_qp_drained and
> spdk_nvmf_rdma_recover
> ** 3bec6601 nvmf/rdma: Simplify spdk_nvmf_rdma_qp_drained
> ** a9b9f0952d6a0c1a37e544ef2977e7db136a8e86 nvmf/rdma: Don't trigger error
> recovery on IBV_EVENT_SQ_DRAINED
> De-allocating the resources associated with the requests being processed by
> SNIC at the time of IBV_EVENT_QP_FATAL is too early.
> The IBV standard requires SPDK should "wait for the Affiliated Asynchronous
> Last WQE Reached Event" before manipulating with the QP state.
> Would also assume it is unsafe to return the associated requests and their
> data back to the pools before "Last WQE Reached" event is received.

I agree, but I think the code on master does that already with just one twist.
We observed that the IBV_EVENT_QP_LAST_WQE_REACHED event would never occur if
an error occured on an RDMA queue pair which had no outstanding I/O. So we can't
always wait for that event before releasing resources. Instead, when we first
are notified of the error via IBV_EVENT_QP_FATAL, we abort all outstanding
commands and then attempt to do the recovery. The recovery quits early if there
are RDMA operations outstanding though. In that case, we'll later get the
IBV_EVENT_QP_LAST_WQE_REACHED event and go through the same path, but not bail
out early. This was all mostly figured out through trial and error when force
disconnecting the NVMe-oF initiator. If you see any flaws in the logic let me
know so we can get them corrected.


> 
> 
> Thanks,
> Philipp
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [SPDK] RDMA qp recovery: follow up on latest changes
@ 2018-08-27 15:43 Philipp Skadorov
  0 siblings, 0 replies; 7+ messages in thread
From: Philipp Skadorov @ 2018-08-27 15:43 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6609 bytes --]



> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker,
> Benjamin
> Sent: Friday, August 24, 2018 4:13 PM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] RDMA qp recovery: follow up on latest changes
> 
> I submitted the following series of patches to address these (and other)
> issues:
> 
> https://review.gerrithub.io/#/c/spdk/spdk/+/423409/
> 
> Please do take a look and let me know if that looks reasonable.

Looks good, put +1.

> 
> 
> On Thu, 2018-08-23 at 21:54 +0000, Philipp Skadorov wrote:
> > > -----Original Message-----
> > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker,
> > > Benjamin
> > > Sent: Thursday, August 23, 2018 3:34 PM
> > > To: spdk(a)lists.01.org
> > > Subject: Re: [SPDK] RDMA qp recovery: follow up on latest changes
> > >
> > > > On Thu, 2018-08-23 at 04:08 +0000, Philipp Skadorov wrote:
> > > > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of
> > > > > Walker, Benjamin
> > > > >
> > > > > On Wed, 2018-08-22 at 22:04 +0000, Philipp Skadorov wrote:
> > > > > > Hello Benjamin,
> > > > > > There have been a series of changes to the QP error recovery
> > > > > > that I'd like to follow up:
> > > > > >
> > > > > > ** c3756ae3 nvmf: Eliminate spdk_nvmf_rdma_update_ibv_qp
> > > > >
> > > > > ibv_query_qp
> > > > > > results in a write syscall to the SNIC driver which is a
> > > > > > context switch.
> > > > > > The original code was calling it when the state is known to
> > > > > > change; the cached value was used everywhere else.
> > > > >
> > > > > The observation driving this patch was that the cached value was
> > > > > always updated before being used, so keeping the cached value
> > > > > wasn't saving any calls. Do you see anywhere in the code on
> > > > > master where a call to ibv_query_qp could be eliminated?
> > > >
> > > > I thought it is enough to update the qp state after a QP async
> > > > event is received only.
> > > > Do you see the cases when it is not?
> > >
> > > So you're saying we probably don't need the one at the top of
> > > spdk_nvmf_rdma_qpair_recover() nor the one in
> > > _spdk_nvmf_rdma_qp_error() because any unsolicited state change will
> > > occur only along with an associated asynchronous event. I think I
> > > agree with that statement. I'd revert the patch that changed this,
> > > but the code has drifted a bit, so I'll probably have to cook up a new one.
> >
> > Fair enough.
> >
> > >
> > > >
> > > > > The only one I see that might be questionable is the one in
> > > > > _spdk_nvmf_rdma_qp_error.
> > > > >
> > > > > >
> > > > > > ** 65a512c6 nvmf/rdma: Combine spdk_nvmf_rdma_qp_drained
> and
> > > > > > spdk_nvmf_rdma_recover
> > > > > > ** 3bec6601 nvmf/rdma: Simplify spdk_nvmf_rdma_qp_drained
> > > > > > ** a9b9f0952d6a0c1a37e544ef2977e7db136a8e86 nvmf/rdma: Don't
> > > > > > trigger error recovery on IBV_EVENT_SQ_DRAINED De-allocating
> > > > > > the resources associated with the requests being processed by
> > > > > > SNIC at the time of IBV_EVENT_QP_FATAL is too early.
> > > > > > The IBV standard requires SPDK should "wait for the Affiliated
> > > > > > Asynchronous Last WQE Reached Event" before manipulating with
> > > > > > the QP
> > > > >
> > > > > state.
> > > > > > Would also assume it is unsafe to return the associated
> > > > > > requests and their data back to the pools before "Last WQE
> > > > > > Reached" event is
> > >
> > > received.
> > > > >
> > > > > I agree, but I think the code on master does that already with
> > > > > just one twist.
> > > > > We observed that the IBV_EVENT_QP_LAST_WQE_REACHED event
> > >
> > > would never
> > > > > occur if an error occured on an RDMA queue pair which had no
> > > > > outstanding I/O. So we can't always wait for that event before
> > > > > releasing resources. Instead, when we first are notified of the
> > > > > error via IBV_EVENT_QP_FATAL, we abort all outstanding commands
> > > > > and then attempt to do the recovery. The recovery quits early if
> > > > > there are RDMA operations outstanding though.
> > > >
> > > > Looking at rdma.c, func _spdk_nvmf_rdma_qp_error:
> > > > 2085         _spdk_nvmf_rdma_qp_cleanup_all_states(rqpair);
> > > >
> > > > It seems to be called at all times once qp is in error state.
> > >
> > > Is the suggestion here to not call
> > > _spdk_nvmf_rdma_qp_cleanup_all_states
> > > until after the check for whether there is I/O outstanding? I'm not
> > > sure I'm interpreting what you're saying correctly, but if that's
> > > what you're saying I think I agree with it.
> >
> > Sorry, I was not clear - let me rephrase it.
> > Looking at _spdk_nvmf_rdma_qp_cleanup_all_states cleans up the
> > requests of three types:
> >
> > RDMA-held requests:
> > RDMA_REQUEST_STATE_TRANSFERRING_HOST_TO_CONTROLLER,
> > RDMA_REQUEST_STATE_TRANSFERRING_CONTROLLER_TO_HOST,
> > RDMA_REQUEST_STATE_COMPLETING
> >
> > The requests that could be cleaned up at any time:
> > RDMA_REQUEST_STATE_NEW,
> > RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING,
> > RDMA_REQUEST_STATE_NEED_BUFFER,
> >
> > BDEV-held requests (that are being processed by bdev layer):
> > RDMA_REQUEST_STATE_EXECUTING,
> >
> > The function _spdk_nvmf_rdma_qp_cleanup_all_states should be split.
> > * RDMA-held requests should be cleaned up only after
> > IBV_EVENT_QP_LAST_WQE_REACHED.
> > * BDEV-held requests should be awaited to complete naturally (unless
> > you know how to force-close them)
> > * " The requests that could be cleaned up at any time " could be
> > cleaned up at any time :)
> >
> > >
> > > > > In that case, we'll later get the IBV_EVENT_QP_LAST_WQE_REACHED
> > > > > event and go through the same
> > >
> > > path,
> > > > > but not bail out early. This was all mostly figured out through
> > > > > trial and error when force disconnecting the NVMe-oF initiator.
> > > > > If you see any flaws in the logic let me know so we can get them
> > > > > corrected.
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Philipp
> > >
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> >
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [SPDK] RDMA qp recovery: follow up on latest changes
@ 2018-08-24 20:13 Walker, Benjamin
  0 siblings, 0 replies; 7+ messages in thread
From: Walker, Benjamin @ 2018-08-24 20:13 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5910 bytes --]

I submitted the following series of patches to address these (and other) issues:

https://review.gerrithub.io/#/c/spdk/spdk/+/423409/

Please do take a look and let me know if that looks reasonable.


On Thu, 2018-08-23 at 21:54 +0000, Philipp Skadorov wrote:
> > -----Original Message-----
> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker,
> > Benjamin
> > Sent: Thursday, August 23, 2018 3:34 PM
> > To: spdk(a)lists.01.org
> > Subject: Re: [SPDK] RDMA qp recovery: follow up on latest changes
> > 
> > > On Thu, 2018-08-23 at 04:08 +0000, Philipp Skadorov wrote:
> > > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker,
> > > > Benjamin
> > > > 
> > > > On Wed, 2018-08-22 at 22:04 +0000, Philipp Skadorov wrote:
> > > > > Hello Benjamin,
> > > > > There have been a series of changes to the QP error recovery that
> > > > > I'd like to follow up:
> > > > > 
> > > > > ** c3756ae3 nvmf: Eliminate spdk_nvmf_rdma_update_ibv_qp
> > > > 
> > > > ibv_query_qp
> > > > > results in a write syscall to the SNIC driver which is a context
> > > > > switch.
> > > > > The original code was calling it when the state is known to
> > > > > change; the cached value was used everywhere else.
> > > > 
> > > > The observation driving this patch was that the cached value was
> > > > always updated before being used, so keeping the cached value wasn't
> > > > saving any calls. Do you see anywhere in the code on master where a
> > > > call to ibv_query_qp could be eliminated?
> > > 
> > > I thought it is enough to update the qp state after a QP async event
> > > is received only.
> > > Do you see the cases when it is not?
> > 
> > So you're saying we probably don't need the one at the top of
> > spdk_nvmf_rdma_qpair_recover() nor the one in
> > _spdk_nvmf_rdma_qp_error() because any unsolicited state change will
> > occur only along with an associated asynchronous event. I think I agree with
> > that statement. I'd revert the patch that changed this, but the code has
> > drifted a bit, so I'll probably have to cook up a new one.
> 
> Fair enough.
> 
> > 
> > > 
> > > > The only one I see that might be
> > > > questionable is the one in _spdk_nvmf_rdma_qp_error.
> > > > 
> > > > > 
> > > > > ** 65a512c6 nvmf/rdma: Combine spdk_nvmf_rdma_qp_drained and
> > > > > spdk_nvmf_rdma_recover
> > > > > ** 3bec6601 nvmf/rdma: Simplify spdk_nvmf_rdma_qp_drained
> > > > > ** a9b9f0952d6a0c1a37e544ef2977e7db136a8e86 nvmf/rdma: Don't
> > > > > trigger error recovery on IBV_EVENT_SQ_DRAINED De-allocating the
> > > > > resources associated with the requests being processed by SNIC at
> > > > > the time of IBV_EVENT_QP_FATAL is too early.
> > > > > The IBV standard requires SPDK should "wait for the Affiliated
> > > > > Asynchronous Last WQE Reached Event" before manipulating with the
> > > > > QP
> > > > 
> > > > state.
> > > > > Would also assume it is unsafe to return the associated requests
> > > > > and their data back to the pools before "Last WQE Reached" event is
> > 
> > received.
> > > > 
> > > > I agree, but I think the code on master does that already with just
> > > > one twist.
> > > > We observed that the IBV_EVENT_QP_LAST_WQE_REACHED event
> > 
> > would never
> > > > occur if an error occured on an RDMA queue pair which had no
> > > > outstanding I/O. So we can't always wait for that event before
> > > > releasing resources. Instead, when we first are notified of the
> > > > error via IBV_EVENT_QP_FATAL, we abort all outstanding commands and
> > > > then attempt to do the recovery. The recovery quits early if there
> > > > are RDMA operations outstanding though.
> > > 
> > > Looking at rdma.c, func _spdk_nvmf_rdma_qp_error:
> > > 2085         _spdk_nvmf_rdma_qp_cleanup_all_states(rqpair);
> > > 
> > > It seems to be called at all times once qp is in error state.
> > 
> > Is the suggestion here to not call _spdk_nvmf_rdma_qp_cleanup_all_states
> > until after the check for whether there is I/O outstanding? I'm not sure I'm
> > interpreting what you're saying correctly, but if that's what you're saying
> > I
> > think I agree with it.
> 
> Sorry, I was not clear - let me rephrase it.
> Looking at _spdk_nvmf_rdma_qp_cleanup_all_states cleans up the requests of
> three types:
> 		
> RDMA-held requests:
> RDMA_REQUEST_STATE_TRANSFERRING_HOST_TO_CONTROLLER,
> RDMA_REQUEST_STATE_TRANSFERRING_CONTROLLER_TO_HOST,
> RDMA_REQUEST_STATE_COMPLETING
> 
> The requests that could be cleaned up at any time:
> RDMA_REQUEST_STATE_NEW,
> RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING,
> RDMA_REQUEST_STATE_NEED_BUFFER,
> 
> BDEV-held requests (that are being processed by bdev layer):
> RDMA_REQUEST_STATE_EXECUTING,
> 
> The function _spdk_nvmf_rdma_qp_cleanup_all_states should be split. 
> * RDMA-held requests should be cleaned up only after
> IBV_EVENT_QP_LAST_WQE_REACHED.
> * BDEV-held requests should be awaited to complete naturally (unless you know
> how to force-close them)
> * " The requests that could be cleaned up at any time " could be cleaned up at
> any time :)
> 
> > 
> > > > In that case, we'll later get the
> > > > IBV_EVENT_QP_LAST_WQE_REACHED event and go through the same
> > 
> > path,
> > > > but not bail out early. This was all mostly figured out through
> > > > trial and error when force disconnecting the NVMe-oF initiator. If
> > > > you see any flaws in the logic let me know so we can get them
> > > > corrected.
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > Philipp
> > 
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [SPDK] RDMA qp recovery: follow up on latest changes
@ 2018-08-23 21:54 Philipp Skadorov
  0 siblings, 0 replies; 7+ messages in thread
From: Philipp Skadorov @ 2018-08-23 21:54 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5199 bytes --]



> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker,
> Benjamin
> Sent: Thursday, August 23, 2018 3:34 PM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] RDMA qp recovery: follow up on latest changes
> 
> > On Thu, 2018-08-23 at 04:08 +0000, Philipp Skadorov wrote:
> > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker,
> > > Benjamin
> > >
> > > On Wed, 2018-08-22 at 22:04 +0000, Philipp Skadorov wrote:
> > > > Hello Benjamin,
> > > > There have been a series of changes to the QP error recovery that
> > > > I'd like to follow up:
> > > >
> > > > ** c3756ae3 nvmf: Eliminate spdk_nvmf_rdma_update_ibv_qp
> > >
> > > ibv_query_qp
> > > > results in a write syscall to the SNIC driver which is a context
> > > > switch.
> > > > The original code was calling it when the state is known to
> > > > change; the cached value was used everywhere else.
> > >
> > > The observation driving this patch was that the cached value was
> > > always updated before being used, so keeping the cached value wasn't
> > > saving any calls. Do you see anywhere in the code on master where a
> > > call to ibv_query_qp could be eliminated?
> >
> > I thought it is enough to update the qp state after a QP async event
> > is received only.
> > Do you see the cases when it is not?
> 
> So you're saying we probably don't need the one at the top of
> spdk_nvmf_rdma_qpair_recover() nor the one in
> _spdk_nvmf_rdma_qp_error() because any unsolicited state change will
> occur only along with an associated asynchronous event. I think I agree with
> that statement. I'd revert the patch that changed this, but the code has
> drifted a bit, so I'll probably have to cook up a new one.

Fair enough.

> 
> >
> > > The only one I see that might be
> > > questionable is the one in _spdk_nvmf_rdma_qp_error.
> > >
> > > >
> > > > ** 65a512c6 nvmf/rdma: Combine spdk_nvmf_rdma_qp_drained and
> > > > spdk_nvmf_rdma_recover
> > > > ** 3bec6601 nvmf/rdma: Simplify spdk_nvmf_rdma_qp_drained
> > > > ** a9b9f0952d6a0c1a37e544ef2977e7db136a8e86 nvmf/rdma: Don't
> > > > trigger error recovery on IBV_EVENT_SQ_DRAINED De-allocating the
> > > > resources associated with the requests being processed by SNIC at
> > > > the time of IBV_EVENT_QP_FATAL is too early.
> > > > The IBV standard requires SPDK should "wait for the Affiliated
> > > > Asynchronous Last WQE Reached Event" before manipulating with the
> > > > QP
> > >
> > > state.
> > > > Would also assume it is unsafe to return the associated requests
> > > > and their data back to the pools before "Last WQE Reached" event is
> received.
> > >
> > > I agree, but I think the code on master does that already with just
> > > one twist.
> > > We observed that the IBV_EVENT_QP_LAST_WQE_REACHED event
> would never
> > > occur if an error occured on an RDMA queue pair which had no
> > > outstanding I/O. So we can't always wait for that event before
> > > releasing resources. Instead, when we first are notified of the
> > > error via IBV_EVENT_QP_FATAL, we abort all outstanding commands and
> > > then attempt to do the recovery. The recovery quits early if there
> > > are RDMA operations outstanding though.
> >
> > Looking at rdma.c, func _spdk_nvmf_rdma_qp_error:
> > 2085         _spdk_nvmf_rdma_qp_cleanup_all_states(rqpair);
> >
> > It seems to be called at all times once qp is in error state.
> 
> Is the suggestion here to not call _spdk_nvmf_rdma_qp_cleanup_all_states
> until after the check for whether there is I/O outstanding? I'm not sure I'm
> interpreting what you're saying correctly, but if that's what you're saying I
> think I agree with it.

Sorry, I was not clear - let me rephrase it.
Looking at _spdk_nvmf_rdma_qp_cleanup_all_states cleans up the requests of three types:
		
RDMA-held requests:
RDMA_REQUEST_STATE_TRANSFERRING_HOST_TO_CONTROLLER, RDMA_REQUEST_STATE_TRANSFERRING_CONTROLLER_TO_HOST,
RDMA_REQUEST_STATE_COMPLETING

The requests that could be cleaned up at any time:
RDMA_REQUEST_STATE_NEW,
RDMA_REQUEST_STATE_DATA_TRANSFER_PENDING,
RDMA_REQUEST_STATE_NEED_BUFFER,

BDEV-held requests (that are being processed by bdev layer):
RDMA_REQUEST_STATE_EXECUTING,

The function _spdk_nvmf_rdma_qp_cleanup_all_states should be split. 
* RDMA-held requests should be cleaned up only after IBV_EVENT_QP_LAST_WQE_REACHED.
* BDEV-held requests should be awaited to complete naturally (unless you know how to force-close them)
* " The requests that could be cleaned up at any time " could be cleaned up at any time :)

> 
> > > In that case, we'll later get the
> > > IBV_EVENT_QP_LAST_WQE_REACHED event and go through the same
> path,
> > > but not bail out early. This was all mostly figured out through
> > > trial and error when force disconnecting the NVMe-oF initiator. If
> > > you see any flaws in the logic let me know so we can get them
> > > corrected.
> > >
> > > >
> > > > Thanks,
> > > > Philipp
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [SPDK] RDMA qp recovery: follow up on latest changes
@ 2018-08-23 19:33 Walker, Benjamin
  0 siblings, 0 replies; 7+ messages in thread
From: Walker, Benjamin @ 2018-08-23 19:33 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3710 bytes --]

> On Thu, 2018-08-23 at 04:08 +0000, Philipp Skadorov wrote:
> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker,
> > Benjamin
> > 
> > On Wed, 2018-08-22 at 22:04 +0000, Philipp Skadorov wrote:
> > > Hello Benjamin,
> > > There have been a series of changes to the QP error recovery that I'd
> > > like to follow up:
> > > 
> > > ** c3756ae3 nvmf: Eliminate spdk_nvmf_rdma_update_ibv_qp
> > 
> > ibv_query_qp
> > > results in a write syscall to the SNIC driver which is a context
> > > switch.
> > > The original code was calling it when the state is known to change;
> > > the cached value was used everywhere else.
> > 
> > The observation driving this patch was that the cached value was always
> > updated before being used, so keeping the cached value wasn't saving any
> > calls. Do you see anywhere in the code on master where a call to
> > ibv_query_qp could be eliminated? 
> 
> I thought it is enough to update the qp state after a QP async event is
> received only.
> Do you see the cases when it is not?

So you're saying we probably don't need the one at the top of
spdk_nvmf_rdma_qpair_recover() nor the one in _spdk_nvmf_rdma_qp_error() because
any unsolicited state change will occur only along with an associated
asynchronous event. I think I agree with that statement. I'd revert the patch
that changed this, but the code has drifted a bit, so I'll probably have to cook
up a new one.

> 
> > The only one I see that might be
> > questionable is the one in _spdk_nvmf_rdma_qp_error.
> > 
> > > 
> > > ** 65a512c6 nvmf/rdma: Combine spdk_nvmf_rdma_qp_drained and
> > > spdk_nvmf_rdma_recover
> > > ** 3bec6601 nvmf/rdma: Simplify spdk_nvmf_rdma_qp_drained
> > > ** a9b9f0952d6a0c1a37e544ef2977e7db136a8e86 nvmf/rdma: Don't trigger
> > > error recovery on IBV_EVENT_SQ_DRAINED De-allocating the resources
> > > associated with the requests being processed by SNIC at the time of
> > > IBV_EVENT_QP_FATAL is too early.
> > > The IBV standard requires SPDK should "wait for the Affiliated
> > > Asynchronous Last WQE Reached Event" before manipulating with the QP
> > 
> > state.
> > > Would also assume it is unsafe to return the associated requests and
> > > their data back to the pools before "Last WQE Reached" event is received.
> > 
> > I agree, but I think the code on master does that already with just one
> > twist.
> > We observed that the IBV_EVENT_QP_LAST_WQE_REACHED event would
> > never occur if an error occured on an RDMA queue pair which had no
> > outstanding I/O. So we can't always wait for that event before releasing
> > resources. Instead, when we first are notified of the error via
> > IBV_EVENT_QP_FATAL, we abort all outstanding commands and then
> > attempt to do the recovery. The recovery quits early if there are RDMA
> > operations outstanding though. 
> 
> Looking at rdma.c, func _spdk_nvmf_rdma_qp_error:
> 2085         _spdk_nvmf_rdma_qp_cleanup_all_states(rqpair);
> 
> It seems to be called at all times once qp is in error state.

Is the suggestion here to not call _spdk_nvmf_rdma_qp_cleanup_all_states until
after the check for whether there is I/O outstanding? I'm not sure I'm
interpreting what you're saying correctly, but if that's what you're saying I
think I agree with it.

> > In that case, we'll later get the
> > IBV_EVENT_QP_LAST_WQE_REACHED event and go through the same path,
> > but not bail out early. This was all mostly figured out through trial and
> > error
> > when force disconnecting the NVMe-oF initiator. If you see any flaws in the
> > logic let me know so we can get them corrected.
> > 
> > > 
> > > Thanks,
> > > Philipp

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [SPDK] RDMA qp recovery: follow up on latest changes
@ 2018-08-23  4:08 Philipp Skadorov
  0 siblings, 0 replies; 7+ messages in thread
From: Philipp Skadorov @ 2018-08-23  4:08 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3325 bytes --]



> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker,
> Benjamin
> Sent: Wednesday, August 22, 2018 6:34 PM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] RDMA qp recovery: follow up on latest changes
> 
> On Wed, 2018-08-22 at 22:04 +0000, Philipp Skadorov wrote:
> > Hello Benjamin,
> > There have been a series of changes to the QP error recovery that I'd
> > like to follow up:
> >
> > ** c3756ae3 nvmf: Eliminate spdk_nvmf_rdma_update_ibv_qp
> ibv_query_qp
> > results in a write syscall to the SNIC driver which is a context
> > switch.
> > The original code was calling it when the state is known to change;
> > the cached value was used everywhere else.
> 
> The observation driving this patch was that the cached value was always
> updated before being used, so keeping the cached value wasn't saving any
> calls. Do you see anywhere in the code on master where a call to
> ibv_query_qp could be eliminated? 

I thought it is enough to update the qp state after a QP async event is received only.
Do you see the cases when it is not?

> The only one I see that might be
> questionable is the one in _spdk_nvmf_rdma_qp_error.
> 
> >
> > ** 65a512c6 nvmf/rdma: Combine spdk_nvmf_rdma_qp_drained and
> > spdk_nvmf_rdma_recover
> > ** 3bec6601 nvmf/rdma: Simplify spdk_nvmf_rdma_qp_drained
> > ** a9b9f0952d6a0c1a37e544ef2977e7db136a8e86 nvmf/rdma: Don't trigger
> > error recovery on IBV_EVENT_SQ_DRAINED De-allocating the resources
> > associated with the requests being processed by SNIC at the time of
> > IBV_EVENT_QP_FATAL is too early.
> > The IBV standard requires SPDK should "wait for the Affiliated
> > Asynchronous Last WQE Reached Event" before manipulating with the QP
> state.
> > Would also assume it is unsafe to return the associated requests and
> > their data back to the pools before "Last WQE Reached" event is received.
> 
> I agree, but I think the code on master does that already with just one twist.
> We observed that the IBV_EVENT_QP_LAST_WQE_REACHED event would
> never occur if an error occured on an RDMA queue pair which had no
> outstanding I/O. So we can't always wait for that event before releasing
> resources. Instead, when we first are notified of the error via
> IBV_EVENT_QP_FATAL, we abort all outstanding commands and then
> attempt to do the recovery. The recovery quits early if there are RDMA
> operations outstanding though. 

Looking at rdma.c, func _spdk_nvmf_rdma_qp_error:
2085         _spdk_nvmf_rdma_qp_cleanup_all_states(rqpair);

It seems to be called at all times once qp is in error state.

> In that case, we'll later get the
> IBV_EVENT_QP_LAST_WQE_REACHED event and go through the same path,
> but not bail out early. This was all mostly figured out through trial and error
> when force disconnecting the NVMe-oF initiator. If you see any flaws in the
> logic let me know so we can get them corrected.
> 
> 
> >
> >
> > Thanks,
> > Philipp
> >
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [SPDK] RDMA qp recovery: follow up on latest changes
@ 2018-08-22 22:04 Philipp Skadorov
  0 siblings, 0 replies; 7+ messages in thread
From: Philipp Skadorov @ 2018-08-22 22:04 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1059 bytes --]

Hello Benjamin,
There have been a series of changes to the QP error recovery that I'd like to follow up:

** c3756ae3 nvmf: Eliminate spdk_nvmf_rdma_update_ibv_qp
ibv_query_qp results in a write syscall to the SNIC driver which is a context switch.
The original code was calling it when the state is known to change; the cached value was used everywhere else.

** 65a512c6 nvmf/rdma: Combine spdk_nvmf_rdma_qp_drained and spdk_nvmf_rdma_recover
** 3bec6601 nvmf/rdma: Simplify spdk_nvmf_rdma_qp_drained
** a9b9f0952d6a0c1a37e544ef2977e7db136a8e86 nvmf/rdma: Don't trigger error recovery on IBV_EVENT_SQ_DRAINED
De-allocating the resources associated with the requests being processed by SNIC at the time of IBV_EVENT_QP_FATAL is too early.
The IBV standard requires SPDK should "wait for the Affiliated Asynchronous Last WQE Reached Event" before manipulating with the QP state.
Would also assume it is unsafe to return the associated requests and their data back to the pools before "Last WQE Reached" event is received.


Thanks,
Philipp


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-08-27 15:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-22 22:33 [SPDK] RDMA qp recovery: follow up on latest changes Walker, Benjamin
  -- strict thread matches above, loose matches on Subject: below --
2018-08-27 15:43 Philipp Skadorov
2018-08-24 20:13 Walker, Benjamin
2018-08-23 21:54 Philipp Skadorov
2018-08-23 19:33 Walker, Benjamin
2018-08-23  4:08 Philipp Skadorov
2018-08-22 22:04 Philipp Skadorov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.