All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [SPDK] spdk_bdev_close() threading
@ 2019-04-04 22:09 Stojaczyk, Dariusz
  0 siblings, 0 replies; 13+ messages in thread
From: Stojaczyk, Dariusz @ 2019-04-04 22:09 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2982 bytes --]



> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, James R
> Sent: Thursday, March 28, 2019 9:41 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] spdk_bdev_close() threading
> 
> 
> 
> On 3/28/19, 7:56 AM, "SPDK on behalf of Stojaczyk, Dariusz" <spdk-
> bounces(a)lists.01.org on behalf of dariusz.stojaczyk(a)intel.com> wrote:
> 
> <snip>
> 
>     >
>     >     Let spdk_bdev_close() to internally schedule itself on desc->thread
> looks
>     > good at the first sight for me.
>     >
>     > I agree we can just have it internally schedule itself.  I think we'll still need
> to
>     > set desc->closed = true in the caller's thread through before we schedule
> it -
>     > otherwise we'll have a race with the _remove_notify path.
> 
>     Okay, cool. Do we have any unit tests for that?
> 
> Not really.  We have unit tests for simulating multi-thread cases (i.e. multiple
> spdk_thread objects) but it's done from a single pthread.
> 
> Thinking about this more though - how do we avoid this race?  We don't want
> bdev to call the user's remove callback after they've closed the descriptor.
> We ensure this today since the remove_callback gets called on the same
> thread as the descriptor was opened.  User will have to be aware that if the
> descriptor was opened on thread A, and is closed on thread B, that it must be
> ready to handle a remove callback on thread A at any point up until
> spdk_bdev_close() returns on thread B.  That sounds messy.

If the application opens a desc on thread A and closes on B, it needs to
handle multi-threading anyway. Generally, you could still open and close the
descriptor on a single thread if you want to keep things simple.

>  A typical first
> reaction to getting a remove callback is to close the descriptor - meaning the
> user has to add its own locking to ensure it doesn't try to close the same
> descriptor on two different threads.

But how we could possibly avoid that with multi-threading?

> 
> What if we just add a helper function (spdk_bdev_desc_get_thread?) to
> return the thread that the descriptor was opened on.  Then at least the user
> doesn't have to keep track of where it was opened itself.  This may still add
> some work on the caller side, but I think trying to push this into the bdev
> layer itself will actually create even more work (and trickier work) on the
> caller side to handle the hot remove race.

I managed to push it into the bdev layer with barely 70 lines of code:
https://review.gerrithub.io/c/spdk/spdk/+/450112

It doesn't affect any use cases which close the descriptor on the same
thread that opened it - it only lifts those single thread limitations.

D.

> 
> -Jim
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] spdk_bdev_close() threading
@ 2019-09-28  2:06 wuzhouhui
  0 siblings, 0 replies; 13+ messages in thread
From: wuzhouhui @ 2019-09-28  2:06 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2444 bytes --]


> -----Original Messages-----
> From: "Stojaczyk, Dariusz" <dariusz.stojaczyk(a)intel.com>
> Sent Time: 2019-03-28 17:13:16 (Thursday)
> To: "Storage Performance Development Kit" <spdk(a)lists.01.org>
> Cc: 
> Subject: [SPDK] spdk_bdev_close() threading
> 
> I was surprised to see that bdev descriptors can be closed only from the same thread that opened them. Vhost doesn't respect this rule. As expected, I was able to trigger assert(desc->thread == spdk_get_thread()) while closing a vhost scsi descriptor using the latest SPDK master. This could be probably fixed by always scheduling the spdk_bdev_close() to proper thread. Maybe vhost could even immediately assume the descriptor is closed and set its `desc` pointer to NULL without waiting for spdk_bdev_close() to be actually called. But why the descriptor needs to be closed from a specific thread in the first place? Would it be possible for spdk_bdev_close() to internally schedule itself on desc->thread?
> 
Darek:

The purpose spdk_bdev_close() running on the same thread with spdk_bdev_open() is a preparation of patch https://review.gerrithub.io/c/spdk/spdk/+/423369.

Let spdk_bdev_close() to internally schedule itself on desc->thread looks good at the first sight for me.

> A descriptor cannot be closed until all associated channel have been destroyed - that's what the bdev programming guide says. When there are multiple I/O channels, there has to be some multi-thread management involved. Also, those channels can't be closed until all their pending I/O has finished. So closing a descriptor will likely have the following flow:
> 
> external request (e.g. bdev hotremove or some RPC) -> start draining I/O on all threads -> destroy each i/o channel after its pending i/o has finished -> on the last thread to destroy a channel schedule closing the desc to a proper thread -> close the desc
> 
> This additional scheduling of spdk_bdev_close() looks completely unnecessary - it also forces the upper layer to maintain a pointer to the desc thread somewhere, because desc->thread is private to the bdev layer. So to repeat the question again -
> would it be possible for spdk_bdev_close() to internally schedule itself on desc->thread, so that spdk_bdev_close() can be called from any thread?
> 
> D.
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] spdk_bdev_close() threading
@ 2019-04-08 14:45 Andrey Kuzmin
  0 siblings, 0 replies; 13+ messages in thread
From: Andrey Kuzmin @ 2019-04-08 14:45 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6283 bytes --]

On Fri, Apr 5, 2019, 20:10 Walker, Benjamin <benjamin.walker(a)intel.com>
wrote:

> On Fri, 2019-04-05 at 08:13 +0000, Stojaczyk, Dariusz wrote:
> > > -----Original Message-----
> > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris,
> James R
> > > Sent: Friday, April 5, 2019 1:35 AM
> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > Subject: Re: [SPDK] spdk_bdev_close() threading
> > >
> > > <SNIP>
> > >
> > >     I managed to push it into the bdev layer with barely 70 lines of
> code:
> > >     https://review.gerrithub.io/c/spdk/spdk/+/450112
> > >
> > > This doesn't handle the race I mentioned above.  Consider this case:
> > >
> > > Thread A opens a bdev.
> > > Thread B later decides it's going to close the bdev for some reason.
> Maybe
> > > the vhost session, nvmf subsystem, or iscsi target node is being
> deleted.
> > > There may be other reasons for user applications (outside of SPDK)
> using the
> > > bdev layer.  So thread B calls spdk_bdev_close().
> > > At the same time, that bdev is hot-removed.  Thread A will then
> callback to
> > > the user that the bdev was hot removed.  This thread calls
> > > spdk_bdev_close() to close the descriptor (which is a normal reaction
> to
> > > getting a hot remove notification).
> > >
> > > To prevent these two threads from racing to call spdk_bdev_close(), the
> > > caller needs to add locking or some other kind of synchronization.
> This is
> > > on
> > > top of the extra locking added in this patch.
> > >
> > > I guess I don't understand why this can't be handled in vhost.  Have
> vhost
> > > send a message to the thread that constructed the SCSI device when it
> wants
> > > to destruct it (which closes the underlying bdevs).  Then that thread
> > > doesn't
> > > have to worry about any races - if the bdev was hotremoved by time the
> > > message arrived, it just doesn't call spdk_bdev_close().
> >
> > I find the vhost logic very complicated already and would like to push
> the
> > extra complexity somewhere else if possible. In that extra vhost message
> > we would have to:
> > * check if the desc wasn't closed before
> > * check if the entire vhost device is still there - it could have been
> removed
> > e.g. on user request
> > * make sure we close the proper descriptor. The one that this message
> > was originally scheduled for might have been removed and a new descriptor
> > has been put in its place
> > * because of the above checks we need to lock anyway
> >
> > We don't have any mechanism in vhost to safely send a message to an
> > arbitrary thread. Most of the multi thread communication is done with
> > spdk_vhost_dev_foreach_session(), which can be called from any thread
> > and basically does:
> >  * execute a callback on each session's thread
> >  * lock the global mutex before each callback, so that it can safely
> access
> > the
> > global vhost device state (which is the usual case for foreach_session()
> - we
> > allow changing the device state from any thread locking the mutex, then
> > use foreach_session() to make each session retrieve the device changes).
> >  * keep the internal refcount of pending asynchronous vhost device
> > operations, so the vhost device can't be removed in the meantime
> >
> > It's not suitable for what you're asking.
>
> Note that the NVMe-oF target has a designated thread for each NVMe-oF
> subsystem,
> and all operations on that subsystem (including all bdev opens and closes)
> get
> funneled there. It then has additional infrastructure to send a message to
> every
> thread in the target to either update local data caches or to pause/resume
> I/O
> on those threads.
>
> NVMe-oF has some issues to work out with hotplug at the moment, but
> design-wise
> I think the threading model it uses is the right one and we need to move
> both
> vhost and iSCSI to a similar model. The way the NVMe-oF target works is
> also
> ideal for the upcoming lightweight threading changes.
>

It's rather surprising then that the above threading model hasn't been
applied to bdev layer where shared resources - such as bdev list - and
associated calls that modify it - such as register/unregister - are neither
protected, nor handled by the dedicated bdev manager thread. This gives me
a headache re how/if these resources will be managed safely in a
multithreaded environment.

Regards,
Andrey

>
> >
> > > I know there are some cases where we have to do locking in SPDK.  But
> I'd
> > > rather avoid adding it if possible.  This stuff is inherently
> difficult to
> > > really to
> > > test and ensure we get right.  This patch may only be about 120 lines
> of
> > > code,
> > > but we don't have any tests to make sure it's correct.
> >
> > I'm not sure if you saw them - this patch contains some unit tests for
> > spdk_bdev_close() called concurrently with spdk_bdev_unregister() on
> > a different thread. I stress-tested the real case scenario in vhost
> locally,
> > but
> > as for the CI, those unit tests are the best I can push right now.
> >
> > D.
> >
> > > -Jim
> > >
> > >
> > >     It doesn't affect any use cases which close the descriptor on the
> same
> > >     thread that opened it - it only lifts those single thread
> limitations.
> > >
> > >     D.
> > >
> > >     >
> > >     > -Jim
> > >     >
> > >     >
> > >     > _______________________________________________
> > >     > SPDK mailing list
> > >     > SPDK(a)lists.01.org
> > >     > https://lists.01.org/mailman/listinfo/spdk
> > >     _______________________________________________
> > >     SPDK mailing list
> > >     SPDK(a)lists.01.org
> > >     https://lists.01.org/mailman/listinfo/spdk
> > >
> > >
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] spdk_bdev_close() threading
@ 2019-04-05 19:32 Stojaczyk, Dariusz
  0 siblings, 0 replies; 13+ messages in thread
From: Stojaczyk, Dariusz @ 2019-04-05 19:32 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2904 bytes --]



> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker,
> Benjamin
> Sent: Friday, April 5, 2019 7:11 PM
> To: spdk(a)lists.01.org
> Subject: Re: [SPDK] spdk_bdev_close() threading
> 
> <snip>
>
> Note that the NVMe-oF target has a designated thread for each NVMe-oF
> subsystem,
> and all operations on that subsystem (including all bdev opens and closes)
> get
> funneled there. It then has additional infrastructure to send a message to
> every
> thread in the target to either update local data caches or to pause/resume
> I/O
> on those threads.
> 
> NVMe-oF has some issues to work out with hotplug at the moment, but
> design-wise
> I think the threading model it uses is the right one and we need to move
> both
> vhost and iSCSI to a similar model. The way the NVMe-oF target works is also
> ideal for the upcoming lightweight threading changes.

Maybe. I guess I still need some more arguments as to why we would
want to enforce the application (possibly a user application) to behave this
way. Why wouldn't we give it a free hand?

D.

> 
> >
> > > I know there are some cases where we have to do locking in SPDK.  But
> I'd
> > > rather avoid adding it if possible.  This stuff is inherently difficult to
> > > really to
> > > test and ensure we get right.  This patch may only be about 120 lines of
> > > code,
> > > but we don't have any tests to make sure it's correct.
> >
> > I'm not sure if you saw them - this patch contains some unit tests for
> > spdk_bdev_close() called concurrently with spdk_bdev_unregister() on
> > a different thread. I stress-tested the real case scenario in vhost locally,
> > but
> > as for the CI, those unit tests are the best I can push right now.
> >
> > D.
> >
> > > -Jim
> > >
> > >
> > >     It doesn't affect any use cases which close the descriptor on the same
> > >     thread that opened it - it only lifts those single thread limitations.
> > >
> > >     D.
> > >
> > >     >
> > >     > -Jim
> > >     >
> > >     >
> > >     > _______________________________________________
> > >     > SPDK mailing list
> > >     > SPDK(a)lists.01.org
> > >     > https://lists.01.org/mailman/listinfo/spdk
> > >     _______________________________________________
> > >     SPDK mailing list
> > >     SPDK(a)lists.01.org
> > >     https://lists.01.org/mailman/listinfo/spdk
> > >
> > >
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] spdk_bdev_close() threading
@ 2019-04-05 17:10 Walker, Benjamin
  0 siblings, 0 replies; 13+ messages in thread
From: Walker, Benjamin @ 2019-04-05 17:10 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5340 bytes --]

On Fri, 2019-04-05 at 08:13 +0000, Stojaczyk, Dariusz wrote:
> > -----Original Message-----
> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, James R
> > Sent: Friday, April 5, 2019 1:35 AM
> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Subject: Re: [SPDK] spdk_bdev_close() threading
> > 
> > <SNIP>
> > 
> >     I managed to push it into the bdev layer with barely 70 lines of code:
> >     https://review.gerrithub.io/c/spdk/spdk/+/450112
> > 
> > This doesn't handle the race I mentioned above.  Consider this case:
> > 
> > Thread A opens a bdev.
> > Thread B later decides it's going to close the bdev for some reason.  Maybe
> > the vhost session, nvmf subsystem, or iscsi target node is being deleted.
> > There may be other reasons for user applications (outside of SPDK) using the
> > bdev layer.  So thread B calls spdk_bdev_close().
> > At the same time, that bdev is hot-removed.  Thread A will then callback to
> > the user that the bdev was hot removed.  This thread calls
> > spdk_bdev_close() to close the descriptor (which is a normal reaction to
> > getting a hot remove notification).
> > 
> > To prevent these two threads from racing to call spdk_bdev_close(), the
> > caller needs to add locking or some other kind of synchronization.  This is
> > on
> > top of the extra locking added in this patch.
> > 
> > I guess I don't understand why this can't be handled in vhost.  Have vhost
> > send a message to the thread that constructed the SCSI device when it wants
> > to destruct it (which closes the underlying bdevs).  Then that thread
> > doesn't
> > have to worry about any races - if the bdev was hotremoved by time the
> > message arrived, it just doesn't call spdk_bdev_close().
> 
> I find the vhost logic very complicated already and would like to push the
> extra complexity somewhere else if possible. In that extra vhost message
> we would have to:
> * check if the desc wasn't closed before
> * check if the entire vhost device is still there - it could have been removed
> e.g. on user request
> * make sure we close the proper descriptor. The one that this message
> was originally scheduled for might have been removed and a new descriptor
> has been put in its place
> * because of the above checks we need to lock anyway
> 
> We don't have any mechanism in vhost to safely send a message to an
> arbitrary thread. Most of the multi thread communication is done with
> spdk_vhost_dev_foreach_session(), which can be called from any thread
> and basically does:
>  * execute a callback on each session's thread
>  * lock the global mutex before each callback, so that it can safely access
> the
> global vhost device state (which is the usual case for foreach_session() - we
> allow changing the device state from any thread locking the mutex, then
> use foreach_session() to make each session retrieve the device changes).
>  * keep the internal refcount of pending asynchronous vhost device
> operations, so the vhost device can't be removed in the meantime
> 
> It's not suitable for what you're asking.

Note that the NVMe-oF target has a designated thread for each NVMe-oF subsystem,
and all operations on that subsystem (including all bdev opens and closes) get
funneled there. It then has additional infrastructure to send a message to every
thread in the target to either update local data caches or to pause/resume I/O
on those threads.

NVMe-oF has some issues to work out with hotplug at the moment, but design-wise
I think the threading model it uses is the right one and we need to move both
vhost and iSCSI to a similar model. The way the NVMe-oF target works is also
ideal for the upcoming lightweight threading changes.

> 
> > I know there are some cases where we have to do locking in SPDK.  But I'd
> > rather avoid adding it if possible.  This stuff is inherently difficult to
> > really to
> > test and ensure we get right.  This patch may only be about 120 lines of
> > code,
> > but we don't have any tests to make sure it's correct.
> 
> I'm not sure if you saw them - this patch contains some unit tests for
> spdk_bdev_close() called concurrently with spdk_bdev_unregister() on
> a different thread. I stress-tested the real case scenario in vhost locally,
> but
> as for the CI, those unit tests are the best I can push right now.
> 
> D.
> 
> > -Jim
> > 
> > 
> >     It doesn't affect any use cases which close the descriptor on the same
> >     thread that opened it - it only lifts those single thread limitations.
> > 
> >     D.
> > 
> >     >
> >     > -Jim
> >     >
> >     >
> >     > _______________________________________________
> >     > SPDK mailing list
> >     > SPDK(a)lists.01.org
> >     > https://lists.01.org/mailman/listinfo/spdk
> >     _______________________________________________
> >     SPDK mailing list
> >     SPDK(a)lists.01.org
> >     https://lists.01.org/mailman/listinfo/spdk
> > 
> > 
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] spdk_bdev_close() threading
@ 2019-04-05  8:13 Stojaczyk, Dariusz
  0 siblings, 0 replies; 13+ messages in thread
From: Stojaczyk, Dariusz @ 2019-04-05  8:13 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4287 bytes --]



> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, James R
> Sent: Friday, April 5, 2019 1:35 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] spdk_bdev_close() threading
> 
> <SNIP>
> 
>     I managed to push it into the bdev layer with barely 70 lines of code:
>     https://review.gerrithub.io/c/spdk/spdk/+/450112
> 
> This doesn't handle the race I mentioned above.  Consider this case:
> 
> Thread A opens a bdev.
> Thread B later decides it's going to close the bdev for some reason.  Maybe
> the vhost session, nvmf subsystem, or iscsi target node is being deleted.
> There may be other reasons for user applications (outside of SPDK) using the
> bdev layer.  So thread B calls spdk_bdev_close().
> At the same time, that bdev is hot-removed.  Thread A will then callback to
> the user that the bdev was hot removed.  This thread calls
> spdk_bdev_close() to close the descriptor (which is a normal reaction to
> getting a hot remove notification).
> 
> To prevent these two threads from racing to call spdk_bdev_close(), the
> caller needs to add locking or some other kind of synchronization.  This is on
> top of the extra locking added in this patch.
> 
> I guess I don't understand why this can't be handled in vhost.  Have vhost
> send a message to the thread that constructed the SCSI device when it wants
> to destruct it (which closes the underlying bdevs).  Then that thread doesn't
> have to worry about any races - if the bdev was hotremoved by time the
> message arrived, it just doesn't call spdk_bdev_close().

I find the vhost logic very complicated already and would like to push the
extra complexity somewhere else if possible. In that extra vhost message
we would have to:
* check if the desc wasn't closed before
* check if the entire vhost device is still there - it could have been removed
e.g. on user request
* make sure we close the proper descriptor. The one that this message
was originally scheduled for might have been removed and a new descriptor
has been put in its place
* because of the above checks we need to lock anyway

We don't have any mechanism in vhost to safely send a message to an
arbitrary thread. Most of the multi thread communication is done with
spdk_vhost_dev_foreach_session(), which can be called from any thread
and basically does:
 * execute a callback on each session's thread
 * lock the global mutex before each callback, so that it can safely access the
global vhost device state (which is the usual case for foreach_session() - we
allow changing the device state from any thread locking the mutex, then
use foreach_session() to make each session retrieve the device changes).
 * keep the internal refcount of pending asynchronous vhost device
operations, so the vhost device can't be removed in the meantime

It's not suitable for what you're asking.

> 
> I know there are some cases where we have to do locking in SPDK.  But I'd
> rather avoid adding it if possible.  This stuff is inherently difficult to really to
> test and ensure we get right.  This patch may only be about 120 lines of code,
> but we don't have any tests to make sure it's correct.

I'm not sure if you saw them - this patch contains some unit tests for
spdk_bdev_close() called concurrently with spdk_bdev_unregister() on
a different thread. I stress-tested the real case scenario in vhost locally, but
as for the CI, those unit tests are the best I can push right now.

D.

> 
> -Jim
> 
> 
>     It doesn't affect any use cases which close the descriptor on the same
>     thread that opened it - it only lifts those single thread limitations.
> 
>     D.
> 
>     >
>     > -Jim
>     >
>     >
>     > _______________________________________________
>     > SPDK mailing list
>     > SPDK(a)lists.01.org
>     > https://lists.01.org/mailman/listinfo/spdk
>     _______________________________________________
>     SPDK mailing list
>     SPDK(a)lists.01.org
>     https://lists.01.org/mailman/listinfo/spdk
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] spdk_bdev_close() threading
@ 2019-04-04 23:35 Harris, James R
  0 siblings, 0 replies; 13+ messages in thread
From: Harris, James R @ 2019-04-04 23:35 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3898 bytes --]



On 4/4/19, 3:09 PM, "SPDK on behalf of Stojaczyk, Dariusz" <spdk-bounces(a)lists.01.org on behalf of dariusz.stojaczyk(a)intel.com> wrote:

<snip>

    > 
    > Thinking about this more though - how do we avoid this race?  We don't want
    > bdev to call the user's remove callback after they've closed the descriptor.
    > We ensure this today since the remove_callback gets called on the same
    > thread as the descriptor was opened.  User will have to be aware that if the
    > descriptor was opened on thread A, and is closed on thread B, that it must be
    > ready to handle a remove callback on thread A at any point up until
    > spdk_bdev_close() returns on thread B.  That sounds messy.
    
    If the application opens a desc on thread A and closes on B, it needs to
    handle multi-threading anyway. Generally, you could still open and close the
    descriptor on a single thread if you want to keep things simple.
    
    >  A typical first
    > reaction to getting a remove callback is to close the descriptor - meaning the
    > user has to add its own locking to ensure it doesn't try to close the same
    > descriptor on two different threads.
    
    But how we could possibly avoid that with multi-threading?
    
    > 
    > What if we just add a helper function (spdk_bdev_desc_get_thread?) to
    > return the thread that the descriptor was opened on.  Then at least the user
    > doesn't have to keep track of where it was opened itself.  This may still add
    > some work on the caller side, but I think trying to push this into the bdev
    > layer itself will actually create even more work (and trickier work) on the
    > caller side to handle the hot remove race.
    
    I managed to push it into the bdev layer with barely 70 lines of code:
    https://review.gerrithub.io/c/spdk/spdk/+/450112

This doesn't handle the race I mentioned above.  Consider this case:

Thread A opens a bdev.
Thread B later decides it's going to close the bdev for some reason.  Maybe the vhost session, nvmf subsystem, or iscsi target node is being deleted.  There may be other reasons for user applications (outside of SPDK) using the bdev layer.  So thread B calls spdk_bdev_close().
At the same time, that bdev is hot-removed.  Thread A will then callback to the user that the bdev was hot removed.  This thread calls spdk_bdev_close() to close the descriptor (which is a normal reaction to getting a hot remove notification).

To prevent these two threads from racing to call spdk_bdev_close(), the caller needs to add locking or some other kind of synchronization.  This is on top of the extra locking added in this patch.

I guess I don't understand why this can't be handled in vhost.  Have vhost send a message to the thread that constructed the SCSI device when it wants to destruct it (which closes the underlying bdevs).  Then that thread doesn't have to worry about any races - if the bdev was hotremoved by time the message arrived, it just doesn't call spdk_bdev_close().

I know there are some cases where we have to do locking in SPDK.  But I'd rather avoid adding it if possible.  This stuff is inherently difficult to really to test and ensure we get right.  This patch may only be about 120 lines of code, but we don't have any tests to make sure it's correct.

-Jim

    
    It doesn't affect any use cases which close the descriptor on the same
    thread that opened it - it only lifts those single thread limitations.
    
    D.
    
    > 
    > -Jim
    > 
    > 
    > _______________________________________________
    > SPDK mailing list
    > SPDK(a)lists.01.org
    > https://lists.01.org/mailman/listinfo/spdk
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] spdk_bdev_close() threading
@ 2019-03-28 20:40 Harris, James R
  0 siblings, 0 replies; 13+ messages in thread
From: Harris, James R @ 2019-03-28 20:40 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1897 bytes --]



On 3/28/19, 7:56 AM, "SPDK on behalf of Stojaczyk, Dariusz" <spdk-bounces(a)lists.01.org on behalf of dariusz.stojaczyk(a)intel.com> wrote:

<snip>

    > 
    >     Let spdk_bdev_close() to internally schedule itself on desc->thread looks
    > good at the first sight for me.
    > 
    > I agree we can just have it internally schedule itself.  I think we'll still need to
    > set desc->closed = true in the caller's thread through before we schedule it -
    > otherwise we'll have a race with the _remove_notify path.
    
    Okay, cool. Do we have any unit tests for that?

Not really.  We have unit tests for simulating multi-thread cases (i.e. multiple spdk_thread objects) but it's done from a single pthread.

Thinking about this more though - how do we avoid this race?  We don't want bdev to call the user's remove callback after they've closed the descriptor.  We ensure this today since the remove_callback gets called on the same thread as the descriptor was opened.  User will have to be aware that if the descriptor was opened on thread A, and is closed on thread B, that it must be ready to handle a remove callback on thread A at any point up until spdk_bdev_close() returns on thread B.  That sounds messy.  A typical first reaction to getting a remove callback is to close the descriptor - meaning the user has to add its own locking to ensure it doesn't try to close the same descriptor on two different threads.

What if we just add a helper function (spdk_bdev_desc_get_thread?) to return the thread that the descriptor was opened on.  Then at least the user doesn't have to keep track of where it was opened itself.  This may still add some work on the caller side, but I think trying to push this into the bdev layer itself will actually create even more work (and trickier work) on the caller side to handle the hot remove race.

-Jim



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] spdk_bdev_close() threading
@ 2019-03-28 14:56 Stojaczyk, Dariusz
  0 siblings, 0 replies; 13+ messages in thread
From: Stojaczyk, Dariusz @ 2019-03-28 14:56 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3710 bytes --]



> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, James R
> Sent: Thursday, March 28, 2019 3:42 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] spdk_bdev_close() threading
> 
> 
> 
> On 3/28/19, 3:20 AM, "SPDK on behalf of wuzhouhui" <spdk-
> bounces(a)lists.01.org on behalf of wuzhouhui14(a)mails.ucas.ac.cn> wrote:
> 
> 
>     > -----Original Messages-----
>     > From: "Stojaczyk, Dariusz" <dariusz.stojaczyk(a)intel.com>
>     > Sent Time: 2019-03-28 17:13:16 (Thursday)
>     > To: "Storage Performance Development Kit" <spdk(a)lists.01.org>
>     > Cc:
>     > Subject: [SPDK] spdk_bdev_close() threading
>     >
>     > I was surprised to see that bdev descriptors can be closed only from the
> same thread that opened them. Vhost doesn't respect this rule. As
> expected, I was able to trigger assert(desc->thread == spdk_get_thread())
> while closing a vhost scsi descriptor using the latest SPDK master. This could
> be probably fixed by always scheduling the spdk_bdev_close() to proper
> thread. Maybe vhost could even immediately assume the descriptor is closed
> and set its `desc` pointer to NULL without waiting for spdk_bdev_close() to
> be actually called. But why the descriptor needs to be closed from a specific
> thread in the first place? Would it be possible for spdk_bdev_close() to
> internally schedule itself on desc->thread?
>     >
>     Darek:
> 
>     The purpose spdk_bdev_close() running on the same thread with
> spdk_bdev_open() is a preparation of patch
> https://review.gerrithub.io/c/spdk/spdk/+/423369.
> 
>     Let spdk_bdev_close() to internally schedule itself on desc->thread looks
> good at the first sight for me.
> 
> I agree we can just have it internally schedule itself.  I think we'll still need to
> set desc->closed = true in the caller's thread through before we schedule it -
> otherwise we'll have a race with the _remove_notify path.

Okay, cool. Do we have any unit tests for that?

D.

> 
> -Jim
> 
> 
>     > A descriptor cannot be closed until all associated channel have been
> destroyed - that's what the bdev programming guide says. When there are
> multiple I/O channels, there has to be some multi-thread management
> involved. Also, those channels can't be closed until all their pending I/O has
> finished. So closing a descriptor will likely have the following flow:
>     >
>     > external request (e.g. bdev hotremove or some RPC) -> start draining I/O
> on all threads -> destroy each i/o channel after its pending i/o has finished ->
> on the last thread to destroy a channel schedule closing the desc to a proper
> thread -> close the desc
>     >
>     > This additional scheduling of spdk_bdev_close() looks completely
> unnecessary - it also forces the upper layer to maintain a pointer to the desc
> thread somewhere, because desc->thread is private to the bdev layer. So to
> repeat the question again -
>     > would it be possible for spdk_bdev_close() to internally schedule itself
> on desc->thread, so that spdk_bdev_close() can be called from any thread?
>     >
>     > D.
>     >
>     > _______________________________________________
>     > SPDK mailing list
>     > SPDK(a)lists.01.org
>     > https://lists.01.org/mailman/listinfo/spdk
>     _______________________________________________
>     SPDK mailing list
>     SPDK(a)lists.01.org
>     https://lists.01.org/mailman/listinfo/spdk
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] spdk_bdev_close() threading
@ 2019-03-28 14:55 Stojaczyk, Dariusz
  0 siblings, 0 replies; 13+ messages in thread
From: Stojaczyk, Dariusz @ 2019-03-28 14:55 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3324 bytes --]



> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey Kuzmin
> Sent: Thursday, March 28, 2019 11:40 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] spdk_bdev_close() threading
> 
> On Thu, Mar 28, 2019, 12:13 Stojaczyk, Dariusz <dariusz.stojaczyk(a)intel.com>
> wrote:
> 
> > I was surprised to see that bdev descriptors can be closed only from the
> > same thread that opened them.
> 
> 
> Sounds counterintuitive since descriptor, contrary to io channel, may be
> shared among multiple threads. Furthermore, I believe such a requirement,
> if introduced, will break a lot more code than just vhost.
> 
> Vhost doesn't respect this rule. As expected, I was able to trigger
> > assert(desc->thread == spdk_get_thread()) while closing a vhost scsi
> > descriptor using the latest SPDK master. This could be probably fixed by
> > always scheduling the spdk_bdev_close() to proper thread. Maybe vhost
> could
> > even immediately assume the descriptor is closed and set its `desc` pointer
> > to NULL without waiting for spdk_bdev_close() to be actually called. But
> > why the descriptor needs to be closed from a specific thread in the first
> > place? Would it be possible for spdk_bdev_close() to internally schedule
> > itself on desc->thread?
> >
> > A descriptor cannot be closed until all associated channel have been
> > destroyed - that's what the bdev programming guide says. When there are
> > multiple I/O channels, there has to be some multi-thread management
> > involved. Also, those channels can't be closed until all their pending I/O
> > has finished. So closing a descriptor will likely have the following flow:
> >
> > external request (e.g. bdev hotremove or some RPC) -> start draining I/O
> > on all threads -> destroy each i/o channel after its pending i/o has
> > finished -> on the last thread to destroy a channel schedule closing the
> > desc to a proper thread -> close the desc
> >
> 
> I believe this is what is very much already happening when bdev_close is
> issued, although wrt the underlying io device (which is the actual
> reference-counting entity for the i/o channels).

That's true. With the current implementation you can even close the descriptor before putting any io channels. The bdev will be unregistered on the proper thread once the last io channel is destroyed. It's not safe to rely on this behavior, but that's actually how vhost block hotremove works.

D.

> 
> Regards,
> Andrey
> 
> >
> > This additional scheduling of spdk_bdev_close() looks completely
> > unnecessary - it also forces the upper layer to maintain a pointer to the
> > desc thread somewhere, because desc->thread is private to the bdev
> layer.
> > So to repeat the question again -
> > would it be possible for spdk_bdev_close() to internally schedule itself
> > on desc->thread, so that spdk_bdev_close() can be called from any
> thread?
> >
> > D.
> >
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> >
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] spdk_bdev_close() threading
@ 2019-03-28 14:41 Harris, James R
  0 siblings, 0 replies; 13+ messages in thread
From: Harris, James R @ 2019-03-28 14:41 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3094 bytes --]



On 3/28/19, 3:20 AM, "SPDK on behalf of wuzhouhui" <spdk-bounces(a)lists.01.org on behalf of wuzhouhui14(a)mails.ucas.ac.cn> wrote:

    
    > -----Original Messages-----
    > From: "Stojaczyk, Dariusz" <dariusz.stojaczyk(a)intel.com>
    > Sent Time: 2019-03-28 17:13:16 (Thursday)
    > To: "Storage Performance Development Kit" <spdk(a)lists.01.org>
    > Cc: 
    > Subject: [SPDK] spdk_bdev_close() threading
    > 
    > I was surprised to see that bdev descriptors can be closed only from the same thread that opened them. Vhost doesn't respect this rule. As expected, I was able to trigger assert(desc->thread == spdk_get_thread()) while closing a vhost scsi descriptor using the latest SPDK master. This could be probably fixed by always scheduling the spdk_bdev_close() to proper thread. Maybe vhost could even immediately assume the descriptor is closed and set its `desc` pointer to NULL without waiting for spdk_bdev_close() to be actually called. But why the descriptor needs to be closed from a specific thread in the first place? Would it be possible for spdk_bdev_close() to internally schedule itself on desc->thread?
    > 
    Darek:
    
    The purpose spdk_bdev_close() running on the same thread with spdk_bdev_open() is a preparation of patch https://review.gerrithub.io/c/spdk/spdk/+/423369.
    
    Let spdk_bdev_close() to internally schedule itself on desc->thread looks good at the first sight for me.

I agree we can just have it internally schedule itself.  I think we'll still need to set desc->closed = true in the caller's thread through before we schedule it - otherwise we'll have a race with the _remove_notify path.

-Jim

    
    > A descriptor cannot be closed until all associated channel have been destroyed - that's what the bdev programming guide says. When there are multiple I/O channels, there has to be some multi-thread management involved. Also, those channels can't be closed until all their pending I/O has finished. So closing a descriptor will likely have the following flow:
    > 
    > external request (e.g. bdev hotremove or some RPC) -> start draining I/O on all threads -> destroy each i/o channel after its pending i/o has finished -> on the last thread to destroy a channel schedule closing the desc to a proper thread -> close the desc
    > 
    > This additional scheduling of spdk_bdev_close() looks completely unnecessary - it also forces the upper layer to maintain a pointer to the desc thread somewhere, because desc->thread is private to the bdev layer. So to repeat the question again -
    > would it be possible for spdk_bdev_close() to internally schedule itself on desc->thread, so that spdk_bdev_close() can be called from any thread?
    > 
    > D.
    > 
    > _______________________________________________
    > SPDK mailing list
    > SPDK(a)lists.01.org
    > https://lists.01.org/mailman/listinfo/spdk
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [SPDK] spdk_bdev_close() threading
@ 2019-03-28 10:40 Andrey Kuzmin
  0 siblings, 0 replies; 13+ messages in thread
From: Andrey Kuzmin @ 2019-03-28 10:40 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2484 bytes --]

On Thu, Mar 28, 2019, 12:13 Stojaczyk, Dariusz <dariusz.stojaczyk(a)intel.com>
wrote:

> I was surprised to see that bdev descriptors can be closed only from the
> same thread that opened them.


Sounds counterintuitive since descriptor, contrary to io channel, may be
shared among multiple threads. Furthermore, I believe such a requirement,
if introduced, will break a lot more code than just vhost.

Vhost doesn't respect this rule. As expected, I was able to trigger
> assert(desc->thread == spdk_get_thread()) while closing a vhost scsi
> descriptor using the latest SPDK master. This could be probably fixed by
> always scheduling the spdk_bdev_close() to proper thread. Maybe vhost could
> even immediately assume the descriptor is closed and set its `desc` pointer
> to NULL without waiting for spdk_bdev_close() to be actually called. But
> why the descriptor needs to be closed from a specific thread in the first
> place? Would it be possible for spdk_bdev_close() to internally schedule
> itself on desc->thread?
>
> A descriptor cannot be closed until all associated channel have been
> destroyed - that's what the bdev programming guide says. When there are
> multiple I/O channels, there has to be some multi-thread management
> involved. Also, those channels can't be closed until all their pending I/O
> has finished. So closing a descriptor will likely have the following flow:
>
> external request (e.g. bdev hotremove or some RPC) -> start draining I/O
> on all threads -> destroy each i/o channel after its pending i/o has
> finished -> on the last thread to destroy a channel schedule closing the
> desc to a proper thread -> close the desc
>

I believe this is what is very much already happening when bdev_close is
issued, although wrt the underlying io device (which is the actual
reference-counting entity for the i/o channels).

Regards,
Andrey

>
> This additional scheduling of spdk_bdev_close() looks completely
> unnecessary - it also forces the upper layer to maintain a pointer to the
> desc thread somewhere, because desc->thread is private to the bdev layer.
> So to repeat the question again -
> would it be possible for spdk_bdev_close() to internally schedule itself
> on desc->thread, so that spdk_bdev_close() can be called from any thread?
>
> D.
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [SPDK] spdk_bdev_close() threading
@ 2019-03-28  9:13 Stojaczyk, Dariusz
  0 siblings, 0 replies; 13+ messages in thread
From: Stojaczyk, Dariusz @ 2019-03-28  9:13 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1738 bytes --]

I was surprised to see that bdev descriptors can be closed only from the same thread that opened them. Vhost doesn't respect this rule. As expected, I was able to trigger assert(desc->thread == spdk_get_thread()) while closing a vhost scsi descriptor using the latest SPDK master. This could be probably fixed by always scheduling the spdk_bdev_close() to proper thread. Maybe vhost could even immediately assume the descriptor is closed and set its `desc` pointer to NULL without waiting for spdk_bdev_close() to be actually called. But why the descriptor needs to be closed from a specific thread in the first place? Would it be possible for spdk_bdev_close() to internally schedule itself on desc->thread?

A descriptor cannot be closed until all associated channel have been destroyed - that's what the bdev programming guide says. When there are multiple I/O channels, there has to be some multi-thread management involved. Also, those channels can't be closed until all their pending I/O has finished. So closing a descriptor will likely have the following flow:

external request (e.g. bdev hotremove or some RPC) -> start draining I/O on all threads -> destroy each i/o channel after its pending i/o has finished -> on the last thread to destroy a channel schedule closing the desc to a proper thread -> close the desc

This additional scheduling of spdk_bdev_close() looks completely unnecessary - it also forces the upper layer to maintain a pointer to the desc thread somewhere, because desc->thread is private to the bdev layer. So to repeat the question again -
would it be possible for spdk_bdev_close() to internally schedule itself on desc->thread, so that spdk_bdev_close() can be called from any thread?

D.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-09-28  2:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-04 22:09 [SPDK] spdk_bdev_close() threading Stojaczyk, Dariusz
  -- strict thread matches above, loose matches on Subject: below --
2019-09-28  2:06 wuzhouhui
2019-04-08 14:45 Andrey Kuzmin
2019-04-05 19:32 Stojaczyk, Dariusz
2019-04-05 17:10 Walker, Benjamin
2019-04-05  8:13 Stojaczyk, Dariusz
2019-04-04 23:35 Harris, James R
2019-03-28 20:40 Harris, James R
2019-03-28 14:56 Stojaczyk, Dariusz
2019-03-28 14:55 Stojaczyk, Dariusz
2019-03-28 14:41 Harris, James R
2019-03-28 10:40 Andrey Kuzmin
2019-03-28  9:13 Stojaczyk, Dariusz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.