All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [SPDK] Some thoughts for the code in rdma.c
@ 2018-12-24  6:19 Yang, Ziye
  0 siblings, 0 replies; 13+ messages in thread
From: Yang, Ziye @ 2018-12-24  6:19 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 11440 bytes --]

Hi Sasha,

You can see this info:  https://github.com/spdk/spdk/issues/537





Best Regards
Ziye Yang 


-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha Kotchubievsky
Sent: Monday, December 24, 2018 2:13 PM
To: spdk(a)lists.01.org
Subject: Re: [SPDK] Some thoughts for the code in rdma.c

Hi,

If refcount mechanism doesn't work correct, it should be fixed. I can't comment on that right now and should do some homework before.

It would be  very appropriated if you elaborate, what happens in fio at receiving "ctl+c" in target. I suppose, "ctrl+c" sends SIGINT to the process.

So, if target receives SIGINT, fio crashes. That's correct ?

Best regards

Sasha

On 12/24/2018 7:49 AM, Yang, Ziye wrote:
> Hi Sasha,
>
> The problem I described is stately in: 
> https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
>
> It is not for the performance but for the functionality.
> The first issue is: what I described in the patch : The thread send by the qpair->group->thread is NULL.
> The second issue is: The current usage of inc/dec_refcnt is not exactly correct.  For spdk_nvmf_qpair_disconnect can be also be directly called in the upper layer (nvmf, for example, if the qpair cannot be added into the polling group). However I found that the inc_refcnt is only called by the same thread, but the dec_refcnt can be called by other threads. In the current usage, it seems that we suggest that there will be qpair destroy from the host side. However, it can be actively called by the NVMe-oF target, however we lack of the matched inc_refcnt.
>
> Thanks.
>
>
>
>
> Best Regards
> Ziye Yang
>
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha 
> Kotchubievsky
> Sent: Monday, December 24, 2018 1:40 PM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>
> Hi Ziye,
>
> We can't ignore those events.
> We have 3 sources of events that triggers QP closing:
> - rdmacm events
> - driver's anync events
> - completions with error status
>
> Disconnect event from rdmacm is an indication of graceful shutdown from initiator. Driver's events and completions with error are indication of errors. Target can get only disconnect event, or combination of errors and disconnect (see my example below).
>   From performance perspective, "data path" (CQ polling) and "control 
> path" (async events from driver) shouldn't be mixed in the same 
> thread. Fetching async events involves kernel and should be separated 
> from "data path". So, some coordination between threads is needed 
> (refcount, for example)
>
> It's not clear for me, which problem you're dealing with. Maybe it can be solved using current approach.
>
> Best regards
> Sasha
>
>
> On 12/24/2018 3:27 AM, Yang, Ziye wrote:
>> Hi Sasha,
>>
>> You mentioned that"
>>
>> " NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE from target side and only then generate disconnect event in librdmacm. Target gets 2 events: completion with error code and disconnect event (doesn't matter in which order)".
>>
>> So my question is that: Could we ignore one of this event? (I mean do nothing for one event.) Since if there is qpair error, we will finally destroy the qpair.
>>
>> Thanks.
>>
>>
>>
>> Best Regards
>> Ziye Yang
>>
>> -----Original Message-----
>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha 
>> Kotchubievsky
>> Sent: Sunday, December 23, 2018 10:09 PM
>> To: spdk(a)lists.01.org
>> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>>
>> Hi Ziye,
>>
>> Which problem do you solve: performance, stability or remove code complexity? Can you elaborate?
>>
>> If you ask me, the refcout, removed by your patch, is needed because qp can be disconnected in response for different events served in different threads. I believe, some synchronization is required.
>>
>> For example, NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE from target side and only then generate disconnect event in librdmacm. Target gets 2 events: completion with error code and disconnect event (doesn't matter in which order). In that case, qp disconnect in target can be triggered by two events coming from different sources and possible handled in different threads.
>>
>> Best regards
>>
>> Sasha
>>
>> On 12/21/2018 3:35 AM, Yang, Ziye wrote:
>>> Hi Ben,
>>>
>>> According to my knowledge, I think that the inc/dec_refcnt is still not needed. I know that the qpair management related with the group (there is a thread which manages the rdma_cm event and another is for polling). The reason that caused this issue is that: we use the spdk_send_msg to add the qpair to the group during the connection, but we return the connection ready to the host side early. So there is a window, that the host sends the disconnect, and there will be no group binding to the qpair. My idea is that we do this checks in spdk_nvmf_qpair_disconnect. If the qpair is not adding to the group, there will be two reasons:
>>>
>>> 1 In the group adding operation, there is an issue, so the qpair is freed. So we do not need to handle the disconnect and free the qpair again.
>>> 2 The qpair is still not added to the group.
>>>
>>> So I suggest if the qpair is not added in the case 2, we should deny 
>>> the disconnect operation or delay such operation. (This is related 
>>> with exceptional case, I think this handling is still reasonable. 
>>> For normal operation, it will not do the connect, and disconnect 
>>> with nothing operation. So deny the disconnect request this time is 
>>> OK, and the initiator can do the disconnect later, and we can make 
>>> sure the disconnect will only be accepted that the qpair is binding 
>>> to the group. And I do not think that this strategy to handle the 
>>> exceptional case will have any side effect.)
>>>
>>> BTW, only checking the qpair->group is not sufficient, but we can have other state of the qpair (which is already defined in struct spdk_nvmf_qpair_state).
>>>
>>> I would like to still post some patches to remove the inc/def_refcnt (may be as test), and let those patches tested by our validation team to see whether this really can solve this issue.
>>>
>>>
>>>
>>>
>>> Best Regards
>>> Ziye Yang
>>>
>>> -----Original Message-----
>>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker, 
>>> Benjamin
>>> Sent: Friday, December 21, 2018 1:16 AM
>>> To: spdk(a)lists.01.org
>>> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>>>
>>> On Thu, 2018-12-20 at 02:59 +0000, Yang, Ziye wrote:
>>>> Hi all,
>>>>
>>>> I would like to discuss the following functions in /lib/nvmf/rdma.c
>>>>
>>>> 1 I do not think that the _nvmf_rdma_disconnect_retry function is necessary.
>>>> This function is used in nvmf_rdma_disconnect. And there is a 
>>>> description as
>>>> follows:
>>>>
>>>>                   /* Read the group out of the qpair. This is 
>>>> normally set and accessed only from
>>>>                   * the thread that created the group. Here, we're 
>>>> not on that thread necessarily.
>>>>                   * The data member qpair->group begins it's life 
>>>> as NULL and then is assigned to
>>>>                   * a pointer and never changes. So fortunately 
>>>> reading this and checking for
>>>>                   * non-NULL is thread safe in the x86_64 memory model.
>>>> */
>>>>
>>>>              But for group adding for the qpair, it is related with 
>>>> this
>>>> function:  spdk_nvmf_poll_group_add, if the group is not ready,  we 
>>>> can just call spdk_nvmf_qpair_disconnect in that function. And the 
>>>> spdk_nvmf_qpair_disconnect should have the strategy to prevent the 
>>>> second time entering if the qpair is not destroyed due to async behavior.
>>> There are two threads involved here typically - a management thread that is polling the RDMA CM Event channel and for IBV async events (which are independent things), and the poll group thread that owns the qpair.
>>>
>>> The RDMA CM Event channel notifies the target of two relevant events 
>>> for a qpair
>>> - connect and disconnect. The IBV async event mechanism independently notifies the target of IBV_EVENT_QP_FATAL conditions. The target's response to both an RDMA_CM_EVENT_DISCONNECT and to an IBV_EVENT_QP_FATAL is to disconnect the queue pair, but there are scenarios where we could legitimately get both events for the same qpair. The processing of both of these events involves passing a message to the poll group thread, which is an asynchronous process and may take an arbitrarily long time. The refcnt coordinates between those two events, such that the first event does not destroy the qpair while there is another event in- flight still.
>>>
>>> The qpair itself, after the initial set up phase, is also managed by the poll group thread. If the user destroys a poll group, or manipulates the poll group to add or remove a different qpair, or the qpair gets a completion entry that it begins processing, then the qpair data structure is being modified by the poll group thread. You can't disconnect it on the management thread while this is happening or you risk corrupting some of the data structures - in particular the poll group's qpair list.
>>>
>>> Checking that the qpair hasn't been assigned to a poll group yet (by looking for qpair.group set to NULL) isn't a sufficient protection because a message to add the qpair to the poll group may already be in-flight. I made some suggestions to use the refcnt to deal with this scenario on your most recent patch:
>>>
>>> https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
>>>
>>>> 2 If that, the following two functions in rdma.c are also unnecessary:
>>>>
>>>> spdk_nvmf_rdma_qpair_inc_refcnt
>>>> spdk_nvmf_rdma_qpair_dec_refcnt
>>>>
>>>>
>>>> Generally, my idea is that, we should use 
>>>> spdk_nvmf_qpair_disconnect directly in every transport, and do not 
>>>> use two many async messaging for doing this because I do not think those are necessary.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>>
>>>> Best Regards
>>>> Ziye Yang
>>>>
>>>> _______________________________________________
>>>> SPDK mailing list
>>>> SPDK(a)lists.01.org
>>>> https://lists.01.org/mailman/listinfo/spdk
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] Some thoughts for the code in rdma.c
@ 2019-01-08  1:45 Yang, Ziye
  0 siblings, 0 replies; 13+ messages in thread
From: Yang, Ziye @ 2019-01-08  1:45 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 13524 bytes --]

Hi Ben,

I revised  a version and submitted it again.




Best Regards
Ziye Yang 


-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker, Benjamin
Sent: Monday, January 7, 2019 10:18 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Some thoughts for the code in rdma.c

Hi Ziye,

The latest version of your patch (https://review.gerrithub.io/#/c/spdk/spdk/+/437232/) no longer removes the refcnt, which is good. You've caught a real bug here with getting a disconnect async event prior to assigning a qpair to a poll group. I made one suggestion regarding the implementation, but good catch on this one.

Thanks,
Ben

> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Yang, Ziye
> Sent: Monday, December 24, 2018 7:19 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
> 
> Hi Sasha,
> 
> You can see this info:  https://github.com/spdk/spdk/issues/537
> 
> 
> 
> 
> 
> Best Regards
> Ziye Yang
> 
> 
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha 
> Kotchubievsky
> Sent: Monday, December 24, 2018 2:13 PM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
> 
> Hi,
> 
> If refcount mechanism doesn't work correct, it should be fixed. I 
> can't comment on that right now and should do some homework before.
> 
> It would be  very appropriated if you elaborate, what happens in fio 
> at receiving "ctl+c" in target. I suppose, "ctrl+c" sends SIGINT to the process.
> 
> So, if target receives SIGINT, fio crashes. That's correct ?
> 
> Best regards
> 
> Sasha
> 
> On 12/24/2018 7:49 AM, Yang, Ziye wrote:
> > Hi Sasha,
> >
> > The problem I described is stately in:
> > https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
> >
> > It is not for the performance but for the functionality.
> > The first issue is: what I described in the patch : The thread send 
> > by the qpair-
> >group->thread is NULL.
> > The second issue is: The current usage of inc/dec_refcnt is not exactly correct.
> For spdk_nvmf_qpair_disconnect can be also be directly called in the 
> upper layer (nvmf, for example, if the qpair cannot be added into the polling group).
> However I found that the inc_refcnt is only called by the same thread, 
> but the dec_refcnt can be called by other threads. In the current 
> usage, it seems that we suggest that there will be qpair destroy from 
> the host side. However, it can be actively called by the NVMe-oF 
> target, however we lack of the matched inc_refcnt.
> >
> > Thanks.
> >
> >
> >
> >
> > Best Regards
> > Ziye Yang
> >
> >
> > -----Original Message-----
> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha 
> > Kotchubievsky
> > Sent: Monday, December 24, 2018 1:40 PM
> > To: spdk(a)lists.01.org
> > Subject: Re: [SPDK] Some thoughts for the code in rdma.c
> >
> > Hi Ziye,
> >
> > We can't ignore those events.
> > We have 3 sources of events that triggers QP closing:
> > - rdmacm events
> > - driver's anync events
> > - completions with error status
> >
> > Disconnect event from rdmacm is an indication of graceful shutdown 
> > from
> initiator. Driver's events and completions with error are indication of errors.
> Target can get only disconnect event, or combination of errors and 
> disconnect (see my example below).
> >   From performance perspective, "data path" (CQ polling) and 
> > "control path" (async events from driver) shouldn't be mixed in the 
> > same thread. Fetching async events involves kernel and should be 
> > separated from "data path". So, some coordination between threads is 
> > needed (refcount, for example)
> >
> > It's not clear for me, which problem you're dealing with. Maybe it 
> > can be
> solved using current approach.
> >
> > Best regards
> > Sasha
> >
> >
> > On 12/24/2018 3:27 AM, Yang, Ziye wrote:
> >> Hi Sasha,
> >>
> >> You mentioned that"
> >>
> >> " NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE
> from target side and only then generate disconnect event in librdmacm. 
> Target gets 2 events: completion with error code and disconnect event 
> (doesn't matter in which order)".
> >>
> >> So my question is that: Could we ignore one of this event? (I mean 
> >> do nothing
> for one event.) Since if there is qpair error, we will finally destroy the qpair.
> >>
> >> Thanks.
> >>
> >>
> >>
> >> Best Regards
> >> Ziye Yang
> >>
> >> -----Original Message-----
> >> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha 
> >> Kotchubievsky
> >> Sent: Sunday, December 23, 2018 10:09 PM
> >> To: spdk(a)lists.01.org
> >> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
> >>
> >> Hi Ziye,
> >>
> >> Which problem do you solve: performance, stability or remove code
> complexity? Can you elaborate?
> >>
> >> If you ask me, the refcout, removed by your patch, is needed 
> >> because qp can
> be disconnected in response for different events served in different 
> threads. I believe, some synchronization is required.
> >>
> >> For example, NVME-OF initiator closes QP in the middle of
> RDMA_READ/RDMA_WRITE from target side and only then generate 
> disconnect event in librdmacm. Target gets 2 events: completion with 
> error code and disconnect event (doesn't matter in which order). In 
> that case, qp disconnect in target can be triggered by two events 
> coming from different sources and possible handled in different threads.
> >>
> >> Best regards
> >>
> >> Sasha
> >>
> >> On 12/21/2018 3:35 AM, Yang, Ziye wrote:
> >>> Hi Ben,
> >>>
> >>> According to my knowledge, I think that the inc/dec_refcnt is 
> >>> still not
> needed. I know that the qpair management related with the group (there 
> is a thread which manages the rdma_cm event and another is for 
> polling). The reason that caused this issue is that: we use the 
> spdk_send_msg to add the qpair to the group during the connection, but 
> we return the connection ready to the host side early. So there is a 
> window, that the host sends the disconnect, and there will be no group 
> binding to the qpair. My idea is that we do this checks in 
> spdk_nvmf_qpair_disconnect. If the qpair is not adding to the group, there will be two reasons:
> >>>
> >>> 1 In the group adding operation, there is an issue, so the qpair 
> >>> is freed. So
> we do not need to handle the disconnect and free the qpair again.
> >>> 2 The qpair is still not added to the group.
> >>>
> >>> So I suggest if the qpair is not added in the case 2, we should 
> >>> deny the disconnect operation or delay such operation. (This is 
> >>> related with exceptional case, I think this handling is still reasonable.
> >>> For normal operation, it will not do the connect, and disconnect 
> >>> with nothing operation. So deny the disconnect request this time 
> >>> is OK, and the initiator can do the disconnect later, and we can 
> >>> make sure the disconnect will only be accepted that the qpair is 
> >>> binding to the group. And I do not think that this strategy to 
> >>> handle the exceptional case will have any side effect.)
> >>>
> >>> BTW, only checking the qpair->group is not sufficient, but we can 
> >>> have
> other state of the qpair (which is already defined in struct 
> spdk_nvmf_qpair_state).
> >>>
> >>> I would like to still post some patches to remove the 
> >>> inc/def_refcnt (may be
> as test), and let those patches tested by our validation team to see 
> whether this really can solve this issue.
> >>>
> >>>
> >>>
> >>>
> >>> Best Regards
> >>> Ziye Yang
> >>>
> >>> -----Original Message-----
> >>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker, 
> >>> Benjamin
> >>> Sent: Friday, December 21, 2018 1:16 AM
> >>> To: spdk(a)lists.01.org
> >>> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
> >>>
> >>> On Thu, 2018-12-20 at 02:59 +0000, Yang, Ziye wrote:
> >>>> Hi all,
> >>>>
> >>>> I would like to discuss the following functions in 
> >>>> /lib/nvmf/rdma.c
> >>>>
> >>>> 1 I do not think that the _nvmf_rdma_disconnect_retry function is
> necessary.
> >>>> This function is used in nvmf_rdma_disconnect. And there is a 
> >>>> description as
> >>>> follows:
> >>>>
> >>>>                   /* Read the group out of the qpair. This is 
> >>>> normally set and accessed only from
> >>>>                   * the thread that created the group. Here, 
> >>>> we're not on that thread necessarily.
> >>>>                   * The data member qpair->group begins it's life 
> >>>> as NULL and then is assigned to
> >>>>                   * a pointer and never changes. So fortunately 
> >>>> reading this and checking for
> >>>>                   * non-NULL is thread safe in the x86_64 memory model.
> >>>> */
> >>>>
> >>>>              But for group adding for the qpair, it is related 
> >>>> with this
> >>>> function:  spdk_nvmf_poll_group_add, if the group is not ready,  
> >>>> we can just call spdk_nvmf_qpair_disconnect in that function. And 
> >>>> the spdk_nvmf_qpair_disconnect should have the strategy to 
> >>>> prevent the second time entering if the qpair is not destroyed due to async behavior.
> >>> There are two threads involved here typically - a management 
> >>> thread that is
> polling the RDMA CM Event channel and for IBV async events (which are 
> independent things), and the poll group thread that owns the qpair.
> >>>
> >>> The RDMA CM Event channel notifies the target of two relevant 
> >>> events for a qpair
> >>> - connect and disconnect. The IBV async event mechanism 
> >>> independently
> notifies the target of IBV_EVENT_QP_FATAL conditions. The target's 
> response to both an RDMA_CM_EVENT_DISCONNECT and to an 
> IBV_EVENT_QP_FATAL is to disconnect the queue pair, but there are 
> scenarios where we could legitimately get both events for the same 
> qpair. The processing of both of these events involves passing a 
> message to the poll group thread, which is an asynchronous process and 
> may take an arbitrarily long time. The refcnt coordinates between 
> those two events, such that the first event does not destroy the qpair while there is another event in- flight still.
> >>>
> >>> The qpair itself, after the initial set up phase, is also managed 
> >>> by the poll
> group thread. If the user destroys a poll group, or manipulates the 
> poll group to add or remove a different qpair, or the qpair gets a 
> completion entry that it begins processing, then the qpair data 
> structure is being modified by the poll group thread. You can't 
> disconnect it on the management thread while this is happening or you 
> risk corrupting some of the data structures - in particular the poll group's qpair list.
> >>>
> >>> Checking that the qpair hasn't been assigned to a poll group yet 
> >>> (by looking
> for qpair.group set to NULL) isn't a sufficient protection because a 
> message to add the qpair to the poll group may already be in-flight. I 
> made some suggestions to use the refcnt to deal with this scenario on 
> your most recent
> patch:
> >>>
> >>> https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
> >>>
> >>>> 2 If that, the following two functions in rdma.c are also unnecessary:
> >>>>
> >>>> spdk_nvmf_rdma_qpair_inc_refcnt
> >>>> spdk_nvmf_rdma_qpair_dec_refcnt
> >>>>
> >>>>
> >>>> Generally, my idea is that, we should use 
> >>>> spdk_nvmf_qpair_disconnect directly in every transport, and do 
> >>>> not use two many async messaging for doing this because I do not 
> >>>> think those
> are necessary.
> >>>>
> >>>> Thanks.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Best Regards
> >>>> Ziye Yang
> >>>>
> >>>> _______________________________________________
> >>>> SPDK mailing list
> >>>> SPDK(a)lists.01.org
> >>>> https://lists.01.org/mailman/listinfo/spdk
> >>> _______________________________________________
> >>> SPDK mailing list
> >>> SPDK(a)lists.01.org
> >>> https://lists.01.org/mailman/listinfo/spdk
> >>> _______________________________________________
> >>> SPDK mailing list
> >>> SPDK(a)lists.01.org
> >>> https://lists.01.org/mailman/listinfo/spdk
> >> _______________________________________________
> >> SPDK mailing list
> >> SPDK(a)lists.01.org
> >> https://lists.01.org/mailman/listinfo/spdk
> >> _______________________________________________
> >> SPDK mailing list
> >> SPDK(a)lists.01.org
> >> https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] Some thoughts for the code in rdma.c
@ 2019-01-07 14:18 Walker, Benjamin
  0 siblings, 0 replies; 13+ messages in thread
From: Walker, Benjamin @ 2019-01-07 14:18 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 12842 bytes --]

Hi Ziye,

The latest version of your patch (https://review.gerrithub.io/#/c/spdk/spdk/+/437232/) no longer removes the refcnt, which is good. You've caught a real bug here with getting a disconnect async event prior to assigning a qpair to a poll group. I made one suggestion regarding the implementation, but good catch on this one.

Thanks,
Ben

> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Yang, Ziye
> Sent: Monday, December 24, 2018 7:19 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
> 
> Hi Sasha,
> 
> You can see this info:  https://github.com/spdk/spdk/issues/537
> 
> 
> 
> 
> 
> Best Regards
> Ziye Yang
> 
> 
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha
> Kotchubievsky
> Sent: Monday, December 24, 2018 2:13 PM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
> 
> Hi,
> 
> If refcount mechanism doesn't work correct, it should be fixed. I can't comment
> on that right now and should do some homework before.
> 
> It would be  very appropriated if you elaborate, what happens in fio at receiving
> "ctl+c" in target. I suppose, "ctrl+c" sends SIGINT to the process.
> 
> So, if target receives SIGINT, fio crashes. That's correct ?
> 
> Best regards
> 
> Sasha
> 
> On 12/24/2018 7:49 AM, Yang, Ziye wrote:
> > Hi Sasha,
> >
> > The problem I described is stately in:
> > https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
> >
> > It is not for the performance but for the functionality.
> > The first issue is: what I described in the patch : The thread send by the qpair-
> >group->thread is NULL.
> > The second issue is: The current usage of inc/dec_refcnt is not exactly correct.
> For spdk_nvmf_qpair_disconnect can be also be directly called in the upper layer
> (nvmf, for example, if the qpair cannot be added into the polling group).
> However I found that the inc_refcnt is only called by the same thread, but the
> dec_refcnt can be called by other threads. In the current usage, it seems that we
> suggest that there will be qpair destroy from the host side. However, it can be
> actively called by the NVMe-oF target, however we lack of the matched
> inc_refcnt.
> >
> > Thanks.
> >
> >
> >
> >
> > Best Regards
> > Ziye Yang
> >
> >
> > -----Original Message-----
> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha
> > Kotchubievsky
> > Sent: Monday, December 24, 2018 1:40 PM
> > To: spdk(a)lists.01.org
> > Subject: Re: [SPDK] Some thoughts for the code in rdma.c
> >
> > Hi Ziye,
> >
> > We can't ignore those events.
> > We have 3 sources of events that triggers QP closing:
> > - rdmacm events
> > - driver's anync events
> > - completions with error status
> >
> > Disconnect event from rdmacm is an indication of graceful shutdown from
> initiator. Driver's events and completions with error are indication of errors.
> Target can get only disconnect event, or combination of errors and disconnect
> (see my example below).
> >   From performance perspective, "data path" (CQ polling) and "control
> > path" (async events from driver) shouldn't be mixed in the same
> > thread. Fetching async events involves kernel and should be separated
> > from "data path". So, some coordination between threads is needed
> > (refcount, for example)
> >
> > It's not clear for me, which problem you're dealing with. Maybe it can be
> solved using current approach.
> >
> > Best regards
> > Sasha
> >
> >
> > On 12/24/2018 3:27 AM, Yang, Ziye wrote:
> >> Hi Sasha,
> >>
> >> You mentioned that"
> >>
> >> " NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE
> from target side and only then generate disconnect event in librdmacm. Target
> gets 2 events: completion with error code and disconnect event (doesn't matter
> in which order)".
> >>
> >> So my question is that: Could we ignore one of this event? (I mean do nothing
> for one event.) Since if there is qpair error, we will finally destroy the qpair.
> >>
> >> Thanks.
> >>
> >>
> >>
> >> Best Regards
> >> Ziye Yang
> >>
> >> -----Original Message-----
> >> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha
> >> Kotchubievsky
> >> Sent: Sunday, December 23, 2018 10:09 PM
> >> To: spdk(a)lists.01.org
> >> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
> >>
> >> Hi Ziye,
> >>
> >> Which problem do you solve: performance, stability or remove code
> complexity? Can you elaborate?
> >>
> >> If you ask me, the refcout, removed by your patch, is needed because qp can
> be disconnected in response for different events served in different threads. I
> believe, some synchronization is required.
> >>
> >> For example, NVME-OF initiator closes QP in the middle of
> RDMA_READ/RDMA_WRITE from target side and only then generate disconnect
> event in librdmacm. Target gets 2 events: completion with error code and
> disconnect event (doesn't matter in which order). In that case, qp disconnect in
> target can be triggered by two events coming from different sources and
> possible handled in different threads.
> >>
> >> Best regards
> >>
> >> Sasha
> >>
> >> On 12/21/2018 3:35 AM, Yang, Ziye wrote:
> >>> Hi Ben,
> >>>
> >>> According to my knowledge, I think that the inc/dec_refcnt is still not
> needed. I know that the qpair management related with the group (there is a
> thread which manages the rdma_cm event and another is for polling). The
> reason that caused this issue is that: we use the spdk_send_msg to add the qpair
> to the group during the connection, but we return the connection ready to the
> host side early. So there is a window, that the host sends the disconnect, and
> there will be no group binding to the qpair. My idea is that we do this checks in
> spdk_nvmf_qpair_disconnect. If the qpair is not adding to the group, there will
> be two reasons:
> >>>
> >>> 1 In the group adding operation, there is an issue, so the qpair is freed. So
> we do not need to handle the disconnect and free the qpair again.
> >>> 2 The qpair is still not added to the group.
> >>>
> >>> So I suggest if the qpair is not added in the case 2, we should deny
> >>> the disconnect operation or delay such operation. (This is related
> >>> with exceptional case, I think this handling is still reasonable.
> >>> For normal operation, it will not do the connect, and disconnect
> >>> with nothing operation. So deny the disconnect request this time is
> >>> OK, and the initiator can do the disconnect later, and we can make
> >>> sure the disconnect will only be accepted that the qpair is binding
> >>> to the group. And I do not think that this strategy to handle the
> >>> exceptional case will have any side effect.)
> >>>
> >>> BTW, only checking the qpair->group is not sufficient, but we can have
> other state of the qpair (which is already defined in struct
> spdk_nvmf_qpair_state).
> >>>
> >>> I would like to still post some patches to remove the inc/def_refcnt (may be
> as test), and let those patches tested by our validation team to see whether this
> really can solve this issue.
> >>>
> >>>
> >>>
> >>>
> >>> Best Regards
> >>> Ziye Yang
> >>>
> >>> -----Original Message-----
> >>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker,
> >>> Benjamin
> >>> Sent: Friday, December 21, 2018 1:16 AM
> >>> To: spdk(a)lists.01.org
> >>> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
> >>>
> >>> On Thu, 2018-12-20 at 02:59 +0000, Yang, Ziye wrote:
> >>>> Hi all,
> >>>>
> >>>> I would like to discuss the following functions in /lib/nvmf/rdma.c
> >>>>
> >>>> 1 I do not think that the _nvmf_rdma_disconnect_retry function is
> necessary.
> >>>> This function is used in nvmf_rdma_disconnect. And there is a
> >>>> description as
> >>>> follows:
> >>>>
> >>>>                   /* Read the group out of the qpair. This is
> >>>> normally set and accessed only from
> >>>>                   * the thread that created the group. Here, we're
> >>>> not on that thread necessarily.
> >>>>                   * The data member qpair->group begins it's life
> >>>> as NULL and then is assigned to
> >>>>                   * a pointer and never changes. So fortunately
> >>>> reading this and checking for
> >>>>                   * non-NULL is thread safe in the x86_64 memory model.
> >>>> */
> >>>>
> >>>>              But for group adding for the qpair, it is related with
> >>>> this
> >>>> function:  spdk_nvmf_poll_group_add, if the group is not ready,  we
> >>>> can just call spdk_nvmf_qpair_disconnect in that function. And the
> >>>> spdk_nvmf_qpair_disconnect should have the strategy to prevent the
> >>>> second time entering if the qpair is not destroyed due to async behavior.
> >>> There are two threads involved here typically - a management thread that is
> polling the RDMA CM Event channel and for IBV async events (which are
> independent things), and the poll group thread that owns the qpair.
> >>>
> >>> The RDMA CM Event channel notifies the target of two relevant events
> >>> for a qpair
> >>> - connect and disconnect. The IBV async event mechanism independently
> notifies the target of IBV_EVENT_QP_FATAL conditions. The target's response
> to both an RDMA_CM_EVENT_DISCONNECT and to an IBV_EVENT_QP_FATAL is
> to disconnect the queue pair, but there are scenarios where we could
> legitimately get both events for the same qpair. The processing of both of these
> events involves passing a message to the poll group thread, which is an
> asynchronous process and may take an arbitrarily long time. The refcnt
> coordinates between those two events, such that the first event does not
> destroy the qpair while there is another event in- flight still.
> >>>
> >>> The qpair itself, after the initial set up phase, is also managed by the poll
> group thread. If the user destroys a poll group, or manipulates the poll group to
> add or remove a different qpair, or the qpair gets a completion entry that it
> begins processing, then the qpair data structure is being modified by the poll
> group thread. You can't disconnect it on the management thread while this is
> happening or you risk corrupting some of the data structures - in particular the
> poll group's qpair list.
> >>>
> >>> Checking that the qpair hasn't been assigned to a poll group yet (by looking
> for qpair.group set to NULL) isn't a sufficient protection because a message to
> add the qpair to the poll group may already be in-flight. I made some
> suggestions to use the refcnt to deal with this scenario on your most recent
> patch:
> >>>
> >>> https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
> >>>
> >>>> 2 If that, the following two functions in rdma.c are also unnecessary:
> >>>>
> >>>> spdk_nvmf_rdma_qpair_inc_refcnt
> >>>> spdk_nvmf_rdma_qpair_dec_refcnt
> >>>>
> >>>>
> >>>> Generally, my idea is that, we should use
> >>>> spdk_nvmf_qpair_disconnect directly in every transport, and do not
> >>>> use two many async messaging for doing this because I do not think those
> are necessary.
> >>>>
> >>>> Thanks.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Best Regards
> >>>> Ziye Yang
> >>>>
> >>>> _______________________________________________
> >>>> SPDK mailing list
> >>>> SPDK(a)lists.01.org
> >>>> https://lists.01.org/mailman/listinfo/spdk
> >>> _______________________________________________
> >>> SPDK mailing list
> >>> SPDK(a)lists.01.org
> >>> https://lists.01.org/mailman/listinfo/spdk
> >>> _______________________________________________
> >>> SPDK mailing list
> >>> SPDK(a)lists.01.org
> >>> https://lists.01.org/mailman/listinfo/spdk
> >> _______________________________________________
> >> SPDK mailing list
> >> SPDK(a)lists.01.org
> >> https://lists.01.org/mailman/listinfo/spdk
> >> _______________________________________________
> >> SPDK mailing list
> >> SPDK(a)lists.01.org
> >> https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] Some thoughts for the code in rdma.c
@ 2018-12-24  6:13 Sasha Kotchubievsky
  0 siblings, 0 replies; 13+ messages in thread
From: Sasha Kotchubievsky @ 2018-12-24  6:13 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 10913 bytes --]

Hi,

If refcount mechanism doesn't work correct, it should be fixed. I can't 
comment on that right now and should do some homework before.

It would be  very appropriated if you elaborate, what happens in fio at 
receiving "ctl+c" in target. I suppose, "ctrl+c" sends SIGINT to the 
process.

So, if target receives SIGINT, fio crashes. That's correct ?

Best regards

Sasha

On 12/24/2018 7:49 AM, Yang, Ziye wrote:
> Hi Sasha,
>
> The problem I described is stately in: https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
>
> It is not for the performance but for the functionality.
> The first issue is: what I described in the patch : The thread send by the qpair->group->thread is NULL.
> The second issue is: The current usage of inc/dec_refcnt is not exactly correct.  For spdk_nvmf_qpair_disconnect can be also be directly called in the upper layer (nvmf, for example, if the qpair cannot be added into the polling group). However I found that the inc_refcnt is only called by the same thread, but the dec_refcnt can be called by other threads. In the current usage, it seems that we suggest that there will be qpair destroy from the host side. However, it can be actively called by the NVMe-oF target, however we lack of the matched inc_refcnt.
>
> Thanks.
>
>
>
>
> Best Regards
> Ziye Yang
>
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha Kotchubievsky
> Sent: Monday, December 24, 2018 1:40 PM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>
> Hi Ziye,
>
> We can't ignore those events.
> We have 3 sources of events that triggers QP closing:
> - rdmacm events
> - driver's anync events
> - completions with error status
>
> Disconnect event from rdmacm is an indication of graceful shutdown from initiator. Driver's events and completions with error are indication of errors. Target can get only disconnect event, or combination of errors and disconnect (see my example below).
>   From performance perspective, "data path" (CQ polling) and "control path" (async events from driver) shouldn't be mixed in the same thread. Fetching async events involves kernel and should be separated from "data path". So, some coordination between threads is needed (refcount, for example)
>
> It's not clear for me, which problem you're dealing with. Maybe it can be solved using current approach.
>
> Best regards
> Sasha
>
>
> On 12/24/2018 3:27 AM, Yang, Ziye wrote:
>> Hi Sasha,
>>
>> You mentioned that"
>>
>> " NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE from target side and only then generate disconnect event in librdmacm. Target gets 2 events: completion with error code and disconnect event (doesn't matter in which order)".
>>
>> So my question is that: Could we ignore one of this event? (I mean do nothing for one event.) Since if there is qpair error, we will finally destroy the qpair.
>>
>> Thanks.
>>
>>
>>
>> Best Regards
>> Ziye Yang
>>
>> -----Original Message-----
>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha
>> Kotchubievsky
>> Sent: Sunday, December 23, 2018 10:09 PM
>> To: spdk(a)lists.01.org
>> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>>
>> Hi Ziye,
>>
>> Which problem do you solve: performance, stability or remove code complexity? Can you elaborate?
>>
>> If you ask me, the refcout, removed by your patch, is needed because qp can be disconnected in response for different events served in different threads. I believe, some synchronization is required.
>>
>> For example, NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE from target side and only then generate disconnect event in librdmacm. Target gets 2 events: completion with error code and disconnect event (doesn't matter in which order). In that case, qp disconnect in target can be triggered by two events coming from different sources and possible handled in different threads.
>>
>> Best regards
>>
>> Sasha
>>
>> On 12/21/2018 3:35 AM, Yang, Ziye wrote:
>>> Hi Ben,
>>>
>>> According to my knowledge, I think that the inc/dec_refcnt is still not needed. I know that the qpair management related with the group (there is a thread which manages the rdma_cm event and another is for polling). The reason that caused this issue is that: we use the spdk_send_msg to add the qpair to the group during the connection, but we return the connection ready to the host side early. So there is a window, that the host sends the disconnect, and there will be no group binding to the qpair. My idea is that we do this checks in spdk_nvmf_qpair_disconnect. If the qpair is not adding to the group, there will be two reasons:
>>>
>>> 1 In the group adding operation, there is an issue, so the qpair is freed. So we do not need to handle the disconnect and free the qpair again.
>>> 2 The qpair is still not added to the group.
>>>
>>> So I suggest if the qpair is not added in the case 2, we should deny
>>> the disconnect operation or delay such operation. (This is related
>>> with exceptional case, I think this handling is still reasonable. For
>>> normal operation, it will not do the connect, and disconnect with
>>> nothing operation. So deny the disconnect request this time is OK,
>>> and the initiator can do the disconnect later, and we can make sure
>>> the disconnect will only be accepted that the qpair is binding to the
>>> group. And I do not think that this strategy to handle the
>>> exceptional case will have any side effect.)
>>>
>>> BTW, only checking the qpair->group is not sufficient, but we can have other state of the qpair (which is already defined in struct spdk_nvmf_qpair_state).
>>>
>>> I would like to still post some patches to remove the inc/def_refcnt (may be as test), and let those patches tested by our validation team to see whether this really can solve this issue.
>>>
>>>
>>>
>>>
>>> Best Regards
>>> Ziye Yang
>>>
>>> -----Original Message-----
>>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker,
>>> Benjamin
>>> Sent: Friday, December 21, 2018 1:16 AM
>>> To: spdk(a)lists.01.org
>>> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>>>
>>> On Thu, 2018-12-20 at 02:59 +0000, Yang, Ziye wrote:
>>>> Hi all,
>>>>
>>>> I would like to discuss the following functions in /lib/nvmf/rdma.c
>>>>
>>>> 1 I do not think that the _nvmf_rdma_disconnect_retry function is necessary.
>>>> This function is used in nvmf_rdma_disconnect. And there is a
>>>> description as
>>>> follows:
>>>>
>>>>                   /* Read the group out of the qpair. This is
>>>> normally set and accessed only from
>>>>                   * the thread that created the group. Here, we're
>>>> not on that thread necessarily.
>>>>                   * The data member qpair->group begins it's life as
>>>> NULL and then is assigned to
>>>>                   * a pointer and never changes. So fortunately
>>>> reading this and checking for
>>>>                   * non-NULL is thread safe in the x86_64 memory model.
>>>> */
>>>>
>>>>              But for group adding for the qpair, it is related with
>>>> this
>>>> function:  spdk_nvmf_poll_group_add, if the group is not ready,  we
>>>> can just call spdk_nvmf_qpair_disconnect in that function. And the
>>>> spdk_nvmf_qpair_disconnect should have the strategy to prevent the
>>>> second time entering if the qpair is not destroyed due to async behavior.
>>> There are two threads involved here typically - a management thread that is polling the RDMA CM Event channel and for IBV async events (which are independent things), and the poll group thread that owns the qpair.
>>>
>>> The RDMA CM Event channel notifies the target of two relevant events
>>> for a qpair
>>> - connect and disconnect. The IBV async event mechanism independently notifies the target of IBV_EVENT_QP_FATAL conditions. The target's response to both an RDMA_CM_EVENT_DISCONNECT and to an IBV_EVENT_QP_FATAL is to disconnect the queue pair, but there are scenarios where we could legitimately get both events for the same qpair. The processing of both of these events involves passing a message to the poll group thread, which is an asynchronous process and may take an arbitrarily long time. The refcnt coordinates between those two events, such that the first event does not destroy the qpair while there is another event in- flight still.
>>>
>>> The qpair itself, after the initial set up phase, is also managed by the poll group thread. If the user destroys a poll group, or manipulates the poll group to add or remove a different qpair, or the qpair gets a completion entry that it begins processing, then the qpair data structure is being modified by the poll group thread. You can't disconnect it on the management thread while this is happening or you risk corrupting some of the data structures - in particular the poll group's qpair list.
>>>
>>> Checking that the qpair hasn't been assigned to a poll group yet (by looking for qpair.group set to NULL) isn't a sufficient protection because a message to add the qpair to the poll group may already be in-flight. I made some suggestions to use the refcnt to deal with this scenario on your most recent patch:
>>>
>>> https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
>>>
>>>> 2 If that, the following two functions in rdma.c are also unnecessary:
>>>>
>>>> spdk_nvmf_rdma_qpair_inc_refcnt
>>>> spdk_nvmf_rdma_qpair_dec_refcnt
>>>>
>>>>
>>>> Generally, my idea is that, we should use spdk_nvmf_qpair_disconnect
>>>> directly in every transport, and do not use two many async messaging
>>>> for doing this because I do not think those are necessary.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>>
>>>> Best Regards
>>>> Ziye Yang
>>>>
>>>> _______________________________________________
>>>> SPDK mailing list
>>>> SPDK(a)lists.01.org
>>>> https://lists.01.org/mailman/listinfo/spdk
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] Some thoughts for the code in rdma.c
@ 2018-12-24  5:51 Sasha Kotchubievsky
  0 siblings, 0 replies; 13+ messages in thread
From: Sasha Kotchubievsky @ 2018-12-24  5:51 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 9205 bytes --]

Hi Ziyn,

Yes, that's correct. "Control" events and "IO" should be handled by 
different threads. But, QP destruction can be triggered also by 
completions with errors. Today, QPs in the same polling group use the 
shared CQ. We submitted alreay a patch that adds shared receive QP.  
Shared resources need some cleanup in data path.

Some details you can find at : 
https://github.com/spdk/spdk/commit/90b4bd6cf9bb5805c0c6d8df982ac5f2e3d90cce

Best regards

Sasha

On 12/24/2018 4:01 AM, Yang, Ziye wrote:
> Hi Sasha,
>
> Moreover, the following two functions:
>
> spdk_nvmf_process_cm_event
> spdk_nvmf_process_ib_event
>
> are executed by one CPU core inside (spdk_nvmf_rdma_accept) function.  So at least in my mind, those cm_event and ib_event is handled by one thread, and the I/Os can be executed in other thread due to the core allocation strategy in new_qpair function. So if all the qpair destruction is finally routed into a same thread, do we still need the lock?
>
> Thanks.
>
>
>
>
>
> Best Regards
> Ziye Yang
>
>
> -----Original Message-----
> From: Yang, Ziye
> Sent: Monday, December 24, 2018 9:28 AM
> To: spdk(a)lists.01.org
> Subject: RE: [SPDK] Some thoughts for the code in rdma.c
>
> Hi Sasha,
>
> You mentioned that"
>
> " NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE from target side and only then generate disconnect event in librdmacm. Target gets 2 events: completion with error code and disconnect event (doesn't matter in which order)".
>
> So my question is that: Could we ignore one of this event? (I mean do nothing for one event.) Since if there is qpair error, we will finally destroy the qpair.
>
> Thanks.
>
>
>
> Best Regards
> Ziye Yang
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha Kotchubievsky
> Sent: Sunday, December 23, 2018 10:09 PM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>
> Hi Ziye,
>
> Which problem do you solve: performance, stability or remove code complexity? Can you elaborate?
>
> If you ask me, the refcout, removed by your patch, is needed because qp can be disconnected in response for different events served in different threads. I believe, some synchronization is required.
>
> For example, NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE from target side and only then generate disconnect event in librdmacm. Target gets 2 events: completion with error code and disconnect event (doesn't matter in which order). In that case, qp disconnect in target can be triggered by two events coming from different sources and possible handled in different threads.
>
> Best regards
>
> Sasha
>
> On 12/21/2018 3:35 AM, Yang, Ziye wrote:
>> Hi Ben,
>>
>> According to my knowledge, I think that the inc/dec_refcnt is still not needed. I know that the qpair management related with the group (there is a thread which manages the rdma_cm event and another is for polling). The reason that caused this issue is that: we use the spdk_send_msg to add the qpair to the group during the connection, but we return the connection ready to the host side early. So there is a window, that the host sends the disconnect, and there will be no group binding to the qpair. My idea is that we do this checks in spdk_nvmf_qpair_disconnect. If the qpair is not adding to the group, there will be two reasons:
>>
>> 1 In the group adding operation, there is an issue, so the qpair is freed. So we do not need to handle the disconnect and free the qpair again.
>> 2 The qpair is still not added to the group.
>>
>> So I suggest if the qpair is not added in the case 2, we should deny
>> the disconnect operation or delay such operation. (This is related
>> with exceptional case, I think this handling is still reasonable. For
>> normal operation, it will not do the connect, and disconnect with
>> nothing operation. So deny the disconnect request this time is OK, and
>> the initiator can do the disconnect later, and we can make sure the
>> disconnect will only be accepted that the qpair is binding to the
>> group. And I do not think that this strategy to handle the exceptional
>> case will have any side effect.)
>>
>> BTW, only checking the qpair->group is not sufficient, but we can have other state of the qpair (which is already defined in struct spdk_nvmf_qpair_state).
>>
>> I would like to still post some patches to remove the inc/def_refcnt (may be as test), and let those patches tested by our validation team to see whether this really can solve this issue.
>>
>>
>>
>>
>> Best Regards
>> Ziye Yang
>>
>> -----Original Message-----
>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker,
>> Benjamin
>> Sent: Friday, December 21, 2018 1:16 AM
>> To: spdk(a)lists.01.org
>> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>>
>> On Thu, 2018-12-20 at 02:59 +0000, Yang, Ziye wrote:
>>> Hi all,
>>>
>>> I would like to discuss the following functions in /lib/nvmf/rdma.c
>>>
>>> 1 I do not think that the _nvmf_rdma_disconnect_retry function is necessary.
>>> This function is used in nvmf_rdma_disconnect. And there is a
>>> description as
>>> follows:
>>>
>>>                  /* Read the group out of the qpair. This is normally
>>> set and accessed only from
>>>                  * the thread that created the group. Here, we're not
>>> on that thread necessarily.
>>>                  * The data member qpair->group begins it's life as
>>> NULL and then is assigned to
>>>                  * a pointer and never changes. So fortunately reading
>>> this and checking for
>>>                  * non-NULL is thread safe in the x86_64 memory model.
>>> */
>>>
>>>             But for group adding for the qpair, it is related with
>>> this
>>> function:  spdk_nvmf_poll_group_add, if the group is not ready,  we
>>> can just call spdk_nvmf_qpair_disconnect in that function. And the
>>> spdk_nvmf_qpair_disconnect should have the strategy to prevent the
>>> second time entering if the qpair is not destroyed due to async behavior.
>> There are two threads involved here typically - a management thread that is polling the RDMA CM Event channel and for IBV async events (which are independent things), and the poll group thread that owns the qpair.
>>
>> The RDMA CM Event channel notifies the target of two relevant events
>> for a qpair
>> - connect and disconnect. The IBV async event mechanism independently notifies the target of IBV_EVENT_QP_FATAL conditions. The target's response to both an RDMA_CM_EVENT_DISCONNECT and to an IBV_EVENT_QP_FATAL is to disconnect the queue pair, but there are scenarios where we could legitimately get both events for the same qpair. The processing of both of these events involves passing a message to the poll group thread, which is an asynchronous process and may take an arbitrarily long time. The refcnt coordinates between those two events, such that the first event does not destroy the qpair while there is another event in- flight still.
>>
>> The qpair itself, after the initial set up phase, is also managed by the poll group thread. If the user destroys a poll group, or manipulates the poll group to add or remove a different qpair, or the qpair gets a completion entry that it begins processing, then the qpair data structure is being modified by the poll group thread. You can't disconnect it on the management thread while this is happening or you risk corrupting some of the data structures - in particular the poll group's qpair list.
>>
>> Checking that the qpair hasn't been assigned to a poll group yet (by looking for qpair.group set to NULL) isn't a sufficient protection because a message to add the qpair to the poll group may already be in-flight. I made some suggestions to use the refcnt to deal with this scenario on your most recent patch:
>>
>> https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
>>
>>> 2 If that, the following two functions in rdma.c are also unnecessary:
>>>
>>> spdk_nvmf_rdma_qpair_inc_refcnt
>>> spdk_nvmf_rdma_qpair_dec_refcnt
>>>
>>>
>>> Generally, my idea is that, we should use spdk_nvmf_qpair_disconnect
>>> directly in every transport, and do not use two many async messaging
>>> for doing this because I do not think those are necessary.
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>> Best Regards
>>> Ziye Yang
>>>
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] Some thoughts for the code in rdma.c
@ 2018-12-24  5:49 Yang, Ziye
  0 siblings, 0 replies; 13+ messages in thread
From: Yang, Ziye @ 2018-12-24  5:49 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 10134 bytes --]

Hi Sasha,

The problem I described is stately in: https://review.gerrithub.io/#/c/spdk/spdk/+/437232/

It is not for the performance but for the functionality.
The first issue is: what I described in the patch : The thread send by the qpair->group->thread is NULL.
The second issue is: The current usage of inc/dec_refcnt is not exactly correct.  For spdk_nvmf_qpair_disconnect can be also be directly called in the upper layer (nvmf, for example, if the qpair cannot be added into the polling group). However I found that the inc_refcnt is only called by the same thread, but the dec_refcnt can be called by other threads. In the current usage, it seems that we suggest that there will be qpair destroy from the host side. However, it can be actively called by the NVMe-oF target, however we lack of the matched inc_refcnt. 

Thanks.




Best Regards
Ziye Yang 


-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha Kotchubievsky
Sent: Monday, December 24, 2018 1:40 PM
To: spdk(a)lists.01.org
Subject: Re: [SPDK] Some thoughts for the code in rdma.c

Hi Ziye,

We can't ignore those events.
We have 3 sources of events that triggers QP closing:
- rdmacm events
- driver's anync events
- completions with error status

Disconnect event from rdmacm is an indication of graceful shutdown from initiator. Driver's events and completions with error are indication of errors. Target can get only disconnect event, or combination of errors and disconnect (see my example below).
 From performance perspective, "data path" (CQ polling) and "control path" (async events from driver) shouldn't be mixed in the same thread. Fetching async events involves kernel and should be separated from "data path". So, some coordination between threads is needed (refcount, for example)

It's not clear for me, which problem you're dealing with. Maybe it can be solved using current approach.

Best regards
Sasha


On 12/24/2018 3:27 AM, Yang, Ziye wrote:
> Hi Sasha,
>
> You mentioned that"
>
> " NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE from target side and only then generate disconnect event in librdmacm. Target gets 2 events: completion with error code and disconnect event (doesn't matter in which order)".
>
> So my question is that: Could we ignore one of this event? (I mean do nothing for one event.) Since if there is qpair error, we will finally destroy the qpair.
>
> Thanks.
>
>
>
> Best Regards
> Ziye Yang
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha 
> Kotchubievsky
> Sent: Sunday, December 23, 2018 10:09 PM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>
> Hi Ziye,
>
> Which problem do you solve: performance, stability or remove code complexity? Can you elaborate?
>
> If you ask me, the refcout, removed by your patch, is needed because qp can be disconnected in response for different events served in different threads. I believe, some synchronization is required.
>
> For example, NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE from target side and only then generate disconnect event in librdmacm. Target gets 2 events: completion with error code and disconnect event (doesn't matter in which order). In that case, qp disconnect in target can be triggered by two events coming from different sources and possible handled in different threads.
>
> Best regards
>
> Sasha
>
> On 12/21/2018 3:35 AM, Yang, Ziye wrote:
>> Hi Ben,
>>
>> According to my knowledge, I think that the inc/dec_refcnt is still not needed. I know that the qpair management related with the group (there is a thread which manages the rdma_cm event and another is for polling). The reason that caused this issue is that: we use the spdk_send_msg to add the qpair to the group during the connection, but we return the connection ready to the host side early. So there is a window, that the host sends the disconnect, and there will be no group binding to the qpair. My idea is that we do this checks in spdk_nvmf_qpair_disconnect. If the qpair is not adding to the group, there will be two reasons:
>>
>> 1 In the group adding operation, there is an issue, so the qpair is freed. So we do not need to handle the disconnect and free the qpair again.
>> 2 The qpair is still not added to the group.
>>
>> So I suggest if the qpair is not added in the case 2, we should deny 
>> the disconnect operation or delay such operation. (This is related 
>> with exceptional case, I think this handling is still reasonable. For 
>> normal operation, it will not do the connect, and disconnect with 
>> nothing operation. So deny the disconnect request this time is OK, 
>> and the initiator can do the disconnect later, and we can make sure 
>> the disconnect will only be accepted that the qpair is binding to the 
>> group. And I do not think that this strategy to handle the 
>> exceptional case will have any side effect.)
>>
>> BTW, only checking the qpair->group is not sufficient, but we can have other state of the qpair (which is already defined in struct spdk_nvmf_qpair_state).
>>
>> I would like to still post some patches to remove the inc/def_refcnt (may be as test), and let those patches tested by our validation team to see whether this really can solve this issue.
>>
>>
>>
>>
>> Best Regards
>> Ziye Yang
>>
>> -----Original Message-----
>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker, 
>> Benjamin
>> Sent: Friday, December 21, 2018 1:16 AM
>> To: spdk(a)lists.01.org
>> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>>
>> On Thu, 2018-12-20 at 02:59 +0000, Yang, Ziye wrote:
>>> Hi all,
>>>
>>> I would like to discuss the following functions in /lib/nvmf/rdma.c
>>>
>>> 1 I do not think that the _nvmf_rdma_disconnect_retry function is necessary.
>>> This function is used in nvmf_rdma_disconnect. And there is a 
>>> description as
>>> follows:
>>>
>>>                  /* Read the group out of the qpair. This is 
>>> normally set and accessed only from
>>>                  * the thread that created the group. Here, we're 
>>> not on that thread necessarily.
>>>                  * The data member qpair->group begins it's life as 
>>> NULL and then is assigned to
>>>                  * a pointer and never changes. So fortunately 
>>> reading this and checking for
>>>                  * non-NULL is thread safe in the x86_64 memory model.
>>> */
>>>
>>>             But for group adding for the qpair, it is related with 
>>> this
>>> function:  spdk_nvmf_poll_group_add, if the group is not ready,  we 
>>> can just call spdk_nvmf_qpair_disconnect in that function. And the 
>>> spdk_nvmf_qpair_disconnect should have the strategy to prevent the 
>>> second time entering if the qpair is not destroyed due to async behavior.
>> There are two threads involved here typically - a management thread that is polling the RDMA CM Event channel and for IBV async events (which are independent things), and the poll group thread that owns the qpair.
>>
>> The RDMA CM Event channel notifies the target of two relevant events 
>> for a qpair
>> - connect and disconnect. The IBV async event mechanism independently notifies the target of IBV_EVENT_QP_FATAL conditions. The target's response to both an RDMA_CM_EVENT_DISCONNECT and to an IBV_EVENT_QP_FATAL is to disconnect the queue pair, but there are scenarios where we could legitimately get both events for the same qpair. The processing of both of these events involves passing a message to the poll group thread, which is an asynchronous process and may take an arbitrarily long time. The refcnt coordinates between those two events, such that the first event does not destroy the qpair while there is another event in- flight still.
>>
>> The qpair itself, after the initial set up phase, is also managed by the poll group thread. If the user destroys a poll group, or manipulates the poll group to add or remove a different qpair, or the qpair gets a completion entry that it begins processing, then the qpair data structure is being modified by the poll group thread. You can't disconnect it on the management thread while this is happening or you risk corrupting some of the data structures - in particular the poll group's qpair list.
>>
>> Checking that the qpair hasn't been assigned to a poll group yet (by looking for qpair.group set to NULL) isn't a sufficient protection because a message to add the qpair to the poll group may already be in-flight. I made some suggestions to use the refcnt to deal with this scenario on your most recent patch:
>>
>> https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
>>
>>> 2 If that, the following two functions in rdma.c are also unnecessary:
>>>
>>> spdk_nvmf_rdma_qpair_inc_refcnt
>>> spdk_nvmf_rdma_qpair_dec_refcnt
>>>
>>>
>>> Generally, my idea is that, we should use spdk_nvmf_qpair_disconnect 
>>> directly in every transport, and do not use two many async messaging 
>>> for doing this because I do not think those are necessary.
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>> Best Regards
>>> Ziye Yang
>>>
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] Some thoughts for the code in rdma.c
@ 2018-12-24  5:40 Sasha Kotchubievsky
  0 siblings, 0 replies; 13+ messages in thread
From: Sasha Kotchubievsky @ 2018-12-24  5:40 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 8858 bytes --]

Hi Ziye,

We can't ignore those events.
We have 3 sources of events that triggers QP closing:
- rdmacm events
- driver's anync events
- completions with error status

Disconnect event from rdmacm is an indication of graceful shutdown from initiator. Driver's events and completions with error are indication of errors. Target can get only disconnect event, or combination of errors and disconnect (see my example below).
 From performance perspective, "data path" (CQ polling) and "control path" (async events from driver) shouldn't be mixed in the same thread. Fetching async events involves kernel and should be separated from "data path". So, some coordination between threads is needed (refcount, for example)

It's not clear for me, which problem you're dealing with. Maybe it can be solved using current approach.

Best regards
Sasha


On 12/24/2018 3:27 AM, Yang, Ziye wrote:
> Hi Sasha,
>
> You mentioned that"
>
> " NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE from target side and only then generate disconnect event in librdmacm. Target gets 2 events: completion with error code and disconnect event (doesn't matter in which order)".
>
> So my question is that: Could we ignore one of this event? (I mean do nothing for one event.) Since if there is qpair error, we will finally destroy the qpair.
>
> Thanks.
>
>
>
> Best Regards
> Ziye Yang
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha Kotchubievsky
> Sent: Sunday, December 23, 2018 10:09 PM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>
> Hi Ziye,
>
> Which problem do you solve: performance, stability or remove code complexity? Can you elaborate?
>
> If you ask me, the refcout, removed by your patch, is needed because qp can be disconnected in response for different events served in different threads. I believe, some synchronization is required.
>
> For example, NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE from target side and only then generate disconnect event in librdmacm. Target gets 2 events: completion with error code and disconnect event (doesn't matter in which order). In that case, qp disconnect in target can be triggered by two events coming from different sources and possible handled in different threads.
>
> Best regards
>
> Sasha
>
> On 12/21/2018 3:35 AM, Yang, Ziye wrote:
>> Hi Ben,
>>
>> According to my knowledge, I think that the inc/dec_refcnt is still not needed. I know that the qpair management related with the group (there is a thread which manages the rdma_cm event and another is for polling). The reason that caused this issue is that: we use the spdk_send_msg to add the qpair to the group during the connection, but we return the connection ready to the host side early. So there is a window, that the host sends the disconnect, and there will be no group binding to the qpair. My idea is that we do this checks in spdk_nvmf_qpair_disconnect. If the qpair is not adding to the group, there will be two reasons:
>>
>> 1 In the group adding operation, there is an issue, so the qpair is freed. So we do not need to handle the disconnect and free the qpair again.
>> 2 The qpair is still not added to the group.
>>
>> So I suggest if the qpair is not added in the case 2, we should deny
>> the disconnect operation or delay such operation. (This is related
>> with exceptional case, I think this handling is still reasonable. For
>> normal operation, it will not do the connect, and disconnect with
>> nothing operation. So deny the disconnect request this time is OK, and
>> the initiator can do the disconnect later, and we can make sure the
>> disconnect will only be accepted that the qpair is binding to the
>> group. And I do not think that this strategy to handle the exceptional
>> case will have any side effect.)
>>
>> BTW, only checking the qpair->group is not sufficient, but we can have other state of the qpair (which is already defined in struct spdk_nvmf_qpair_state).
>>
>> I would like to still post some patches to remove the inc/def_refcnt (may be as test), and let those patches tested by our validation team to see whether this really can solve this issue.
>>
>>
>>
>>
>> Best Regards
>> Ziye Yang
>>
>> -----Original Message-----
>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker,
>> Benjamin
>> Sent: Friday, December 21, 2018 1:16 AM
>> To: spdk(a)lists.01.org
>> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>>
>> On Thu, 2018-12-20 at 02:59 +0000, Yang, Ziye wrote:
>>> Hi all,
>>>
>>> I would like to discuss the following functions in /lib/nvmf/rdma.c
>>>
>>> 1 I do not think that the _nvmf_rdma_disconnect_retry function is necessary.
>>> This function is used in nvmf_rdma_disconnect. And there is a
>>> description as
>>> follows:
>>>
>>>                  /* Read the group out of the qpair. This is normally
>>> set and accessed only from
>>>                  * the thread that created the group. Here, we're not
>>> on that thread necessarily.
>>>                  * The data member qpair->group begins it's life as
>>> NULL and then is assigned to
>>>                  * a pointer and never changes. So fortunately reading
>>> this and checking for
>>>                  * non-NULL is thread safe in the x86_64 memory model.
>>> */
>>>
>>>             But for group adding for the qpair, it is related with
>>> this
>>> function:  spdk_nvmf_poll_group_add, if the group is not ready,  we
>>> can just call spdk_nvmf_qpair_disconnect in that function. And the
>>> spdk_nvmf_qpair_disconnect should have the strategy to prevent the
>>> second time entering if the qpair is not destroyed due to async behavior.
>> There are two threads involved here typically - a management thread that is polling the RDMA CM Event channel and for IBV async events (which are independent things), and the poll group thread that owns the qpair.
>>
>> The RDMA CM Event channel notifies the target of two relevant events
>> for a qpair
>> - connect and disconnect. The IBV async event mechanism independently notifies the target of IBV_EVENT_QP_FATAL conditions. The target's response to both an RDMA_CM_EVENT_DISCONNECT and to an IBV_EVENT_QP_FATAL is to disconnect the queue pair, but there are scenarios where we could legitimately get both events for the same qpair. The processing of both of these events involves passing a message to the poll group thread, which is an asynchronous process and may take an arbitrarily long time. The refcnt coordinates between those two events, such that the first event does not destroy the qpair while there is another event in- flight still.
>>
>> The qpair itself, after the initial set up phase, is also managed by the poll group thread. If the user destroys a poll group, or manipulates the poll group to add or remove a different qpair, or the qpair gets a completion entry that it begins processing, then the qpair data structure is being modified by the poll group thread. You can't disconnect it on the management thread while this is happening or you risk corrupting some of the data structures - in particular the poll group's qpair list.
>>
>> Checking that the qpair hasn't been assigned to a poll group yet (by looking for qpair.group set to NULL) isn't a sufficient protection because a message to add the qpair to the poll group may already be in-flight. I made some suggestions to use the refcnt to deal with this scenario on your most recent patch:
>>
>> https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
>>
>>> 2 If that, the following two functions in rdma.c are also unnecessary:
>>>
>>> spdk_nvmf_rdma_qpair_inc_refcnt
>>> spdk_nvmf_rdma_qpair_dec_refcnt
>>>
>>>
>>> Generally, my idea is that, we should use spdk_nvmf_qpair_disconnect
>>> directly in every transport, and do not use two many async messaging
>>> for doing this because I do not think those are necessary.
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>> Best Regards
>>> Ziye Yang
>>>
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] Some thoughts for the code in rdma.c
@ 2018-12-24  2:01 Yang, Ziye
  0 siblings, 0 replies; 13+ messages in thread
From: Yang, Ziye @ 2018-12-24  2:01 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 8357 bytes --]

Hi Sasha,

Moreover, the following two functions:

spdk_nvmf_process_cm_event
spdk_nvmf_process_ib_event

are executed by one CPU core inside (spdk_nvmf_rdma_accept) function.  So at least in my mind, those cm_event and ib_event is handled by one thread, and the I/Os can be executed in other thread due to the core allocation strategy in new_qpair function. So if all the qpair destruction is finally routed into a same thread, do we still need the lock?

Thanks.





Best Regards
Ziye Yang 


-----Original Message-----
From: Yang, Ziye 
Sent: Monday, December 24, 2018 9:28 AM
To: spdk(a)lists.01.org
Subject: RE: [SPDK] Some thoughts for the code in rdma.c

Hi Sasha,

You mentioned that"

" NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE from target side and only then generate disconnect event in librdmacm. Target gets 2 events: completion with error code and disconnect event (doesn't matter in which order)". 

So my question is that: Could we ignore one of this event? (I mean do nothing for one event.) Since if there is qpair error, we will finally destroy the qpair.

Thanks.



Best Regards
Ziye Yang 

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha Kotchubievsky
Sent: Sunday, December 23, 2018 10:09 PM
To: spdk(a)lists.01.org
Subject: Re: [SPDK] Some thoughts for the code in rdma.c

Hi Ziye,

Which problem do you solve: performance, stability or remove code complexity? Can you elaborate?

If you ask me, the refcout, removed by your patch, is needed because qp can be disconnected in response for different events served in different threads. I believe, some synchronization is required.

For example, NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE from target side and only then generate disconnect event in librdmacm. Target gets 2 events: completion with error code and disconnect event (doesn't matter in which order). In that case, qp disconnect in target can be triggered by two events coming from different sources and possible handled in different threads.

Best regards

Sasha

On 12/21/2018 3:35 AM, Yang, Ziye wrote:
> Hi Ben,
>
> According to my knowledge, I think that the inc/dec_refcnt is still not needed. I know that the qpair management related with the group (there is a thread which manages the rdma_cm event and another is for polling). The reason that caused this issue is that: we use the spdk_send_msg to add the qpair to the group during the connection, but we return the connection ready to the host side early. So there is a window, that the host sends the disconnect, and there will be no group binding to the qpair. My idea is that we do this checks in spdk_nvmf_qpair_disconnect. If the qpair is not adding to the group, there will be two reasons:
>
> 1 In the group adding operation, there is an issue, so the qpair is freed. So we do not need to handle the disconnect and free the qpair again.
> 2 The qpair is still not added to the group.
>
> So I suggest if the qpair is not added in the case 2, we should deny 
> the disconnect operation or delay such operation. (This is related 
> with exceptional case, I think this handling is still reasonable. For 
> normal operation, it will not do the connect, and disconnect with 
> nothing operation. So deny the disconnect request this time is OK, and 
> the initiator can do the disconnect later, and we can make sure the 
> disconnect will only be accepted that the qpair is binding to the 
> group. And I do not think that this strategy to handle the exceptional 
> case will have any side effect.)
>
> BTW, only checking the qpair->group is not sufficient, but we can have other state of the qpair (which is already defined in struct spdk_nvmf_qpair_state).
>
> I would like to still post some patches to remove the inc/def_refcnt (may be as test), and let those patches tested by our validation team to see whether this really can solve this issue.
>
>
>
>
> Best Regards
> Ziye Yang
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker, 
> Benjamin
> Sent: Friday, December 21, 2018 1:16 AM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>
> On Thu, 2018-12-20 at 02:59 +0000, Yang, Ziye wrote:
>> Hi all,
>>
>> I would like to discuss the following functions in /lib/nvmf/rdma.c
>>
>> 1 I do not think that the _nvmf_rdma_disconnect_retry function is necessary.
>> This function is used in nvmf_rdma_disconnect. And there is a 
>> description as
>> follows:
>>
>>                 /* Read the group out of the qpair. This is normally 
>> set and accessed only from
>>                 * the thread that created the group. Here, we're not 
>> on that thread necessarily.
>>                 * The data member qpair->group begins it's life as 
>> NULL and then is assigned to
>>                 * a pointer and never changes. So fortunately reading 
>> this and checking for
>>                 * non-NULL is thread safe in the x86_64 memory model.
>> */
>>
>>            But for group adding for the qpair, it is related with 
>> this
>> function:  spdk_nvmf_poll_group_add, if the group is not ready,  we 
>> can just call spdk_nvmf_qpair_disconnect in that function. And the 
>> spdk_nvmf_qpair_disconnect should have the strategy to prevent the 
>> second time entering if the qpair is not destroyed due to async behavior.
> There are two threads involved here typically - a management thread that is polling the RDMA CM Event channel and for IBV async events (which are independent things), and the poll group thread that owns the qpair.
>
> The RDMA CM Event channel notifies the target of two relevant events 
> for a qpair
> - connect and disconnect. The IBV async event mechanism independently notifies the target of IBV_EVENT_QP_FATAL conditions. The target's response to both an RDMA_CM_EVENT_DISCONNECT and to an IBV_EVENT_QP_FATAL is to disconnect the queue pair, but there are scenarios where we could legitimately get both events for the same qpair. The processing of both of these events involves passing a message to the poll group thread, which is an asynchronous process and may take an arbitrarily long time. The refcnt coordinates between those two events, such that the first event does not destroy the qpair while there is another event in- flight still.
>
> The qpair itself, after the initial set up phase, is also managed by the poll group thread. If the user destroys a poll group, or manipulates the poll group to add or remove a different qpair, or the qpair gets a completion entry that it begins processing, then the qpair data structure is being modified by the poll group thread. You can't disconnect it on the management thread while this is happening or you risk corrupting some of the data structures - in particular the poll group's qpair list.
>
> Checking that the qpair hasn't been assigned to a poll group yet (by looking for qpair.group set to NULL) isn't a sufficient protection because a message to add the qpair to the poll group may already be in-flight. I made some suggestions to use the refcnt to deal with this scenario on your most recent patch:
>
> https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
>
>> 2 If that, the following two functions in rdma.c are also unnecessary:
>>
>> spdk_nvmf_rdma_qpair_inc_refcnt
>> spdk_nvmf_rdma_qpair_dec_refcnt
>>
>>
>> Generally, my idea is that, we should use spdk_nvmf_qpair_disconnect 
>> directly in every transport, and do not use two many async messaging 
>> for doing this because I do not think those are necessary.
>>
>> Thanks.
>>
>>
>>
>>
>> Best Regards
>> Ziye Yang
>>
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] Some thoughts for the code in rdma.c
@ 2018-12-24  1:27 Yang, Ziye
  0 siblings, 0 replies; 13+ messages in thread
From: Yang, Ziye @ 2018-12-24  1:27 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 7669 bytes --]

Hi Sasha,

You mentioned that"

" NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE from target side and only then generate disconnect event in librdmacm. Target gets 2 events: completion with error code and disconnect event (doesn't matter in which order)". 

So my question is that: Could we ignore one of this event? (I mean do nothing for one event.) Since if there is qpair error, we will finally destroy the qpair.

Thanks.



Best Regards
Ziye Yang 

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha Kotchubievsky
Sent: Sunday, December 23, 2018 10:09 PM
To: spdk(a)lists.01.org
Subject: Re: [SPDK] Some thoughts for the code in rdma.c

Hi Ziye,

Which problem do you solve: performance, stability or remove code complexity? Can you elaborate?

If you ask me, the refcout, removed by your patch, is needed because qp can be disconnected in response for different events served in different threads. I believe, some synchronization is required.

For example, NVME-OF initiator closes QP in the middle of RDMA_READ/RDMA_WRITE from target side and only then generate disconnect event in librdmacm. Target gets 2 events: completion with error code and disconnect event (doesn't matter in which order). In that case, qp disconnect in target can be triggered by two events coming from different sources and possible handled in different threads.

Best regards

Sasha

On 12/21/2018 3:35 AM, Yang, Ziye wrote:
> Hi Ben,
>
> According to my knowledge, I think that the inc/dec_refcnt is still not needed. I know that the qpair management related with the group (there is a thread which manages the rdma_cm event and another is for polling). The reason that caused this issue is that: we use the spdk_send_msg to add the qpair to the group during the connection, but we return the connection ready to the host side early. So there is a window, that the host sends the disconnect, and there will be no group binding to the qpair. My idea is that we do this checks in spdk_nvmf_qpair_disconnect. If the qpair is not adding to the group, there will be two reasons:
>
> 1 In the group adding operation, there is an issue, so the qpair is freed. So we do not need to handle the disconnect and free the qpair again.
> 2 The qpair is still not added to the group.
>
> So I suggest if the qpair is not added in the case 2, we should deny 
> the disconnect operation or delay such operation. (This is related 
> with exceptional case, I think this handling is still reasonable. For 
> normal operation, it will not do the connect, and disconnect with 
> nothing operation. So deny the disconnect request this time is OK, and 
> the initiator can do the disconnect later, and we can make sure the 
> disconnect will only be accepted that the qpair is binding to the 
> group. And I do not think that this strategy to handle the exceptional 
> case will have any side effect.)
>
> BTW, only checking the qpair->group is not sufficient, but we can have other state of the qpair (which is already defined in struct spdk_nvmf_qpair_state).
>
> I would like to still post some patches to remove the inc/def_refcnt (may be as test), and let those patches tested by our validation team to see whether this really can solve this issue.
>
>
>
>
> Best Regards
> Ziye Yang
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker, 
> Benjamin
> Sent: Friday, December 21, 2018 1:16 AM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>
> On Thu, 2018-12-20 at 02:59 +0000, Yang, Ziye wrote:
>> Hi all,
>>
>> I would like to discuss the following functions in /lib/nvmf/rdma.c
>>
>> 1 I do not think that the _nvmf_rdma_disconnect_retry function is necessary.
>> This function is used in nvmf_rdma_disconnect. And there is a 
>> description as
>> follows:
>>
>>                 /* Read the group out of the qpair. This is normally 
>> set and accessed only from
>>                 * the thread that created the group. Here, we're not 
>> on that thread necessarily.
>>                 * The data member qpair->group begins it's life as 
>> NULL and then is assigned to
>>                 * a pointer and never changes. So fortunately reading 
>> this and checking for
>>                 * non-NULL is thread safe in the x86_64 memory model.
>> */
>>
>>            But for group adding for the qpair, it is related with 
>> this
>> function:  spdk_nvmf_poll_group_add, if the group is not ready,  we 
>> can just call spdk_nvmf_qpair_disconnect in that function. And the 
>> spdk_nvmf_qpair_disconnect should have the strategy to prevent the 
>> second time entering if the qpair is not destroyed due to async behavior.
> There are two threads involved here typically - a management thread that is polling the RDMA CM Event channel and for IBV async events (which are independent things), and the poll group thread that owns the qpair.
>
> The RDMA CM Event channel notifies the target of two relevant events 
> for a qpair
> - connect and disconnect. The IBV async event mechanism independently notifies the target of IBV_EVENT_QP_FATAL conditions. The target's response to both an RDMA_CM_EVENT_DISCONNECT and to an IBV_EVENT_QP_FATAL is to disconnect the queue pair, but there are scenarios where we could legitimately get both events for the same qpair. The processing of both of these events involves passing a message to the poll group thread, which is an asynchronous process and may take an arbitrarily long time. The refcnt coordinates between those two events, such that the first event does not destroy the qpair while there is another event in- flight still.
>
> The qpair itself, after the initial set up phase, is also managed by the poll group thread. If the user destroys a poll group, or manipulates the poll group to add or remove a different qpair, or the qpair gets a completion entry that it begins processing, then the qpair data structure is being modified by the poll group thread. You can't disconnect it on the management thread while this is happening or you risk corrupting some of the data structures - in particular the poll group's qpair list.
>
> Checking that the qpair hasn't been assigned to a poll group yet (by looking for qpair.group set to NULL) isn't a sufficient protection because a message to add the qpair to the poll group may already be in-flight. I made some suggestions to use the refcnt to deal with this scenario on your most recent patch:
>
> https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
>
>> 2 If that, the following two functions in rdma.c are also unnecessary:
>>
>> spdk_nvmf_rdma_qpair_inc_refcnt
>> spdk_nvmf_rdma_qpair_dec_refcnt
>>
>>
>> Generally, my idea is that, we should use spdk_nvmf_qpair_disconnect 
>> directly in every transport, and do not use two many async messaging 
>> for doing this because I do not think those are necessary.
>>
>> Thanks.
>>
>>
>>
>>
>> Best Regards
>> Ziye Yang
>>
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] Some thoughts for the code in rdma.c
@ 2018-12-23 14:09 Sasha Kotchubievsky
  0 siblings, 0 replies; 13+ messages in thread
From: Sasha Kotchubievsky @ 2018-12-23 14:09 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6771 bytes --]

Hi Ziye,

Which problem do you solve: performance, stability or remove code 
complexity? Can you elaborate?

If you ask me, the refcout, removed by your patch, is needed because qp 
can be disconnected in response for different events served in different 
threads. I believe, some synchronization is required.

For example, NVME-OF initiator closes QP in the middle of 
RDMA_READ/RDMA_WRITE from target side and only then generate disconnect 
event in librdmacm. Target gets 2 events: completion with error code and 
disconnect event (doesn't matter in which order). In that case, qp 
disconnect in target can be triggered by two events coming from 
different sources and possible handled in different threads.

Best regards

Sasha

On 12/21/2018 3:35 AM, Yang, Ziye wrote:
> Hi Ben,
>
> According to my knowledge, I think that the inc/dec_refcnt is still not needed. I know that the qpair management related with the group (there is a thread which manages the rdma_cm event and another is for polling). The reason that caused this issue is that: we use the spdk_send_msg to add the qpair to the group during the connection, but we return the connection ready to the host side early. So there is a window, that the host sends the disconnect, and there will be no group binding to the qpair. My idea is that we do this checks in spdk_nvmf_qpair_disconnect. If the qpair is not adding to the group, there will be two reasons:
>
> 1 In the group adding operation, there is an issue, so the qpair is freed. So we do not need to handle the disconnect and free the qpair again.
> 2 The qpair is still not added to the group.
>
> So I suggest if the qpair is not added in the case 2, we should deny the disconnect operation or delay such operation. (This is related with exceptional case, I think this handling is still reasonable. For normal operation, it will not do the connect, and disconnect with nothing operation. So deny the disconnect request this time is OK, and the initiator can do the disconnect later, and we can make sure the disconnect will only be accepted that the qpair is binding to the group. And I do not think that this strategy to handle the exceptional case will have any side effect.)
>
> BTW, only checking the qpair->group is not sufficient, but we can have other state of the qpair (which is already defined in struct spdk_nvmf_qpair_state).
>
> I would like to still post some patches to remove the inc/def_refcnt (may be as test), and let those patches tested by our validation team to see whether this really can solve this issue.
>
>
>
>
> Best Regards
> Ziye Yang
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker, Benjamin
> Sent: Friday, December 21, 2018 1:16 AM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] Some thoughts for the code in rdma.c
>
> On Thu, 2018-12-20 at 02:59 +0000, Yang, Ziye wrote:
>> Hi all,
>>
>> I would like to discuss the following functions in /lib/nvmf/rdma.c
>>
>> 1 I do not think that the _nvmf_rdma_disconnect_retry function is necessary.
>> This function is used in nvmf_rdma_disconnect. And there is a
>> description as
>> follows:
>>
>>                 /* Read the group out of the qpair. This is normally
>> set and accessed only from
>>                 * the thread that created the group. Here, we're not on
>> that thread necessarily.
>>                 * The data member qpair->group begins it's life as NULL
>> and then is assigned to
>>                 * a pointer and never changes. So fortunately reading
>> this and checking for
>>                 * non-NULL is thread safe in the x86_64 memory model.
>> */
>>
>>            But for group adding for the qpair, it is related with this
>> function:  spdk_nvmf_poll_group_add, if the group is not ready,  we
>> can just call spdk_nvmf_qpair_disconnect in that function. And the
>> spdk_nvmf_qpair_disconnect should have the strategy to prevent the
>> second time entering if the qpair is not destroyed due to async behavior.
> There are two threads involved here typically - a management thread that is polling the RDMA CM Event channel and for IBV async events (which are independent things), and the poll group thread that owns the qpair.
>
> The RDMA CM Event channel notifies the target of two relevant events for a qpair
> - connect and disconnect. The IBV async event mechanism independently notifies the target of IBV_EVENT_QP_FATAL conditions. The target's response to both an RDMA_CM_EVENT_DISCONNECT and to an IBV_EVENT_QP_FATAL is to disconnect the queue pair, but there are scenarios where we could legitimately get both events for the same qpair. The processing of both of these events involves passing a message to the poll group thread, which is an asynchronous process and may take an arbitrarily long time. The refcnt coordinates between those two events, such that the first event does not destroy the qpair while there is another event in- flight still.
>
> The qpair itself, after the initial set up phase, is also managed by the poll group thread. If the user destroys a poll group, or manipulates the poll group to add or remove a different qpair, or the qpair gets a completion entry that it begins processing, then the qpair data structure is being modified by the poll group thread. You can't disconnect it on the management thread while this is happening or you risk corrupting some of the data structures - in particular the poll group's qpair list.
>
> Checking that the qpair hasn't been assigned to a poll group yet (by looking for qpair.group set to NULL) isn't a sufficient protection because a message to add the qpair to the poll group may already be in-flight. I made some suggestions to use the refcnt to deal with this scenario on your most recent patch:
>
> https://review.gerrithub.io/#/c/spdk/spdk/+/437232/
>
>> 2 If that, the following two functions in rdma.c are also unnecessary:
>>
>> spdk_nvmf_rdma_qpair_inc_refcnt
>> spdk_nvmf_rdma_qpair_dec_refcnt
>>
>>
>> Generally, my idea is that, we should use spdk_nvmf_qpair_disconnect
>> directly in every transport, and do not use two many async messaging
>> for doing this because I do not think those are necessary.
>>
>> Thanks.
>>
>>
>>
>>
>> Best Regards
>> Ziye Yang
>>
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] Some thoughts for the code in rdma.c
@ 2018-12-21  1:35 Yang, Ziye
  0 siblings, 0 replies; 13+ messages in thread
From: Yang, Ziye @ 2018-12-21  1:35 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5752 bytes --]

Hi Ben,

According to my knowledge, I think that the inc/dec_refcnt is still not needed. I know that the qpair management related with the group (there is a thread which manages the rdma_cm event and another is for polling). The reason that caused this issue is that: we use the spdk_send_msg to add the qpair to the group during the connection, but we return the connection ready to the host side early. So there is a window, that the host sends the disconnect, and there will be no group binding to the qpair. My idea is that we do this checks in spdk_nvmf_qpair_disconnect. If the qpair is not adding to the group, there will be two reasons:

1 In the group adding operation, there is an issue, so the qpair is freed. So we do not need to handle the disconnect and free the qpair again.
2 The qpair is still not added to the group.

So I suggest if the qpair is not added in the case 2, we should deny the disconnect operation or delay such operation. (This is related with exceptional case, I think this handling is still reasonable. For normal operation, it will not do the connect, and disconnect with nothing operation. So deny the disconnect request this time is OK, and the initiator can do the disconnect later, and we can make sure the disconnect will only be accepted that the qpair is binding to the group. And I do not think that this strategy to handle the exceptional case will have any side effect.)

BTW, only checking the qpair->group is not sufficient, but we can have other state of the qpair (which is already defined in struct spdk_nvmf_qpair_state).  

I would like to still post some patches to remove the inc/def_refcnt (may be as test), and let those patches tested by our validation team to see whether this really can solve this issue. 




Best Regards
Ziye Yang 

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker, Benjamin
Sent: Friday, December 21, 2018 1:16 AM
To: spdk(a)lists.01.org
Subject: Re: [SPDK] Some thoughts for the code in rdma.c

On Thu, 2018-12-20 at 02:59 +0000, Yang, Ziye wrote:
> Hi all,
> 
> I would like to discuss the following functions in /lib/nvmf/rdma.c
> 
> 1 I do not think that the _nvmf_rdma_disconnect_retry function is necessary.
> This function is used in nvmf_rdma_disconnect. And there is a 
> description as
> follows:
> 
>                /* Read the group out of the qpair. This is normally 
> set and accessed only from
>                * the thread that created the group. Here, we're not on 
> that thread necessarily.
>                * The data member qpair->group begins it's life as NULL 
> and then is assigned to
>                * a pointer and never changes. So fortunately reading 
> this and checking for
>                * non-NULL is thread safe in the x86_64 memory model. 
> */
> 
>           But for group adding for the qpair, it is related with this
> function:  spdk_nvmf_poll_group_add, if the group is not ready,  we 
> can just call spdk_nvmf_qpair_disconnect in that function. And the 
> spdk_nvmf_qpair_disconnect should have the strategy to prevent the 
> second time entering if the qpair is not destroyed due to async behavior.

There are two threads involved here typically - a management thread that is polling the RDMA CM Event channel and for IBV async events (which are independent things), and the poll group thread that owns the qpair.

The RDMA CM Event channel notifies the target of two relevant events for a qpair
- connect and disconnect. The IBV async event mechanism independently notifies the target of IBV_EVENT_QP_FATAL conditions. The target's response to both an RDMA_CM_EVENT_DISCONNECT and to an IBV_EVENT_QP_FATAL is to disconnect the queue pair, but there are scenarios where we could legitimately get both events for the same qpair. The processing of both of these events involves passing a message to the poll group thread, which is an asynchronous process and may take an arbitrarily long time. The refcnt coordinates between those two events, such that the first event does not destroy the qpair while there is another event in- flight still.

The qpair itself, after the initial set up phase, is also managed by the poll group thread. If the user destroys a poll group, or manipulates the poll group to add or remove a different qpair, or the qpair gets a completion entry that it begins processing, then the qpair data structure is being modified by the poll group thread. You can't disconnect it on the management thread while this is happening or you risk corrupting some of the data structures - in particular the poll group's qpair list.

Checking that the qpair hasn't been assigned to a poll group yet (by looking for qpair.group set to NULL) isn't a sufficient protection because a message to add the qpair to the poll group may already be in-flight. I made some suggestions to use the refcnt to deal with this scenario on your most recent patch:

https://review.gerrithub.io/#/c/spdk/spdk/+/437232/

> 
> 2 If that, the following two functions in rdma.c are also unnecessary:
> 
> spdk_nvmf_rdma_qpair_inc_refcnt
> spdk_nvmf_rdma_qpair_dec_refcnt
> 
> 
> Generally, my idea is that, we should use spdk_nvmf_qpair_disconnect 
> directly in every transport, and do not use two many async messaging 
> for doing this because I do not think those are necessary.
> 
> Thanks.
> 
> 
> 
> 
> Best Regards
> Ziye Yang
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] Some thoughts for the code in rdma.c
@ 2018-12-20 17:15 Walker, Benjamin
  0 siblings, 0 replies; 13+ messages in thread
From: Walker, Benjamin @ 2018-12-20 17:15 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3574 bytes --]

On Thu, 2018-12-20 at 02:59 +0000, Yang, Ziye wrote:
> Hi all,
> 
> I would like to discuss the following functions in /lib/nvmf/rdma.c
> 
> 1 I do not think that the _nvmf_rdma_disconnect_retry function is necessary.
> This function is used in nvmf_rdma_disconnect. And there is a description as
> follows:
> 
>                /* Read the group out of the qpair. This is normally set and
> accessed only from
>                * the thread that created the group. Here, we're not on that
> thread necessarily.
>                * The data member qpair->group begins it's life as NULL and
> then is assigned to
>                * a pointer and never changes. So fortunately reading this and
> checking for
>                * non-NULL is thread safe in the x86_64 memory model. */
> 
>           But for group adding for the qpair, it is related with this
> function:  spdk_nvmf_poll_group_add, if the group is not ready,  we can just
> call spdk_nvmf_qpair_disconnect in that function. And the
> spdk_nvmf_qpair_disconnect should have the strategy to prevent the second time
> entering if the qpair is not destroyed due to async behavior.

There are two threads involved here typically - a management thread that is
polling the RDMA CM Event channel and for IBV async events (which are
independent things), and the poll group thread that owns the qpair.

The RDMA CM Event channel notifies the target of two relevant events for a qpair
- connect and disconnect. The IBV async event mechanism independently notifies
the target of IBV_EVENT_QP_FATAL conditions. The target's response to both an
RDMA_CM_EVENT_DISCONNECT and to an IBV_EVENT_QP_FATAL is to disconnect the queue
pair, but there are scenarios where we could legitimately get both events for
the same qpair. The processing of both of these events involves passing a
message to the poll group thread, which is an asynchronous process and may take
an arbitrarily long time. The refcnt coordinates between those two events, such
that the first event does not destroy the qpair while there is another event in-
flight still.

The qpair itself, after the initial set up phase, is also managed by the poll
group thread. If the user destroys a poll group, or manipulates the poll group
to add or remove a different qpair, or the qpair gets a completion entry that it
begins processing, then the qpair data structure is being modified by the poll
group thread. You can't disconnect it on the management thread while this is
happening or you risk corrupting some of the data structures - in particular the
poll group's qpair list.

Checking that the qpair hasn't been assigned to a poll group yet (by looking for
qpair.group set to NULL) isn't a sufficient protection because a message to add
the qpair to the poll group may already be in-flight. I made some suggestions to
use the refcnt to deal with this scenario on your most recent patch:

https://review.gerrithub.io/#/c/spdk/spdk/+/437232/

> 
> 2 If that, the following two functions in rdma.c are also unnecessary:
> 
> spdk_nvmf_rdma_qpair_inc_refcnt
> spdk_nvmf_rdma_qpair_dec_refcnt
> 
> 
> Generally, my idea is that, we should use spdk_nvmf_qpair_disconnect directly
> in every transport, and do not use two many async messaging for doing this
> because I do not think those are necessary.
> 
> Thanks.
> 
> 
> 
> 
> Best Regards
> Ziye Yang
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [SPDK] Some thoughts for the code in rdma.c
@ 2018-12-20  2:59 Yang, Ziye
  0 siblings, 0 replies; 13+ messages in thread
From: Yang, Ziye @ 2018-12-20  2:59 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1441 bytes --]

Hi all,

I would like to discuss the following functions in /lib/nvmf/rdma.c

1 I do not think that the _nvmf_rdma_disconnect_retry function is necessary. This function is used in nvmf_rdma_disconnect. And there is a description as follows:

               /* Read the group out of the qpair. This is normally set and accessed only from
               * the thread that created the group. Here, we're not on that thread necessarily.
               * The data member qpair->group begins it's life as NULL and then is assigned to
               * a pointer and never changes. So fortunately reading this and checking for
               * non-NULL is thread safe in the x86_64 memory model. */

          But for group adding for the qpair, it is related with this function:  spdk_nvmf_poll_group_add, if the group is not ready,  we can just call spdk_nvmf_qpair_disconnect in that function. And the spdk_nvmf_qpair_disconnect should have the strategy to prevent the second time entering if the qpair is not destroyed due to async behavior.

2 If that, the following two functions in rdma.c are also unnecessary:

spdk_nvmf_rdma_qpair_inc_refcnt
spdk_nvmf_rdma_qpair_dec_refcnt


Generally, my idea is that, we should use spdk_nvmf_qpair_disconnect directly in every transport, and do not use two many async messaging for doing this because I do not think those are necessary.

Thanks.




Best Regards
Ziye Yang


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-01-08  1:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-24  6:19 [SPDK] Some thoughts for the code in rdma.c Yang, Ziye
  -- strict thread matches above, loose matches on Subject: below --
2019-01-08  1:45 Yang, Ziye
2019-01-07 14:18 Walker, Benjamin
2018-12-24  6:13 Sasha Kotchubievsky
2018-12-24  5:51 Sasha Kotchubievsky
2018-12-24  5:49 Yang, Ziye
2018-12-24  5:40 Sasha Kotchubievsky
2018-12-24  2:01 Yang, Ziye
2018-12-24  1:27 Yang, Ziye
2018-12-23 14:09 Sasha Kotchubievsky
2018-12-21  1:35 Yang, Ziye
2018-12-20 17:15 Walker, Benjamin
2018-12-20  2:59 Yang, Ziye

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.