All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrey Kuzmin <andrey.v.kuzmin at gmail.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] spdk_bdev_close() threading
Date: Mon, 08 Apr 2019 17:45:36 +0300	[thread overview]
Message-ID: <CANvN+emhZk_mof466n_uvj2GPMKorhwmMg_Hyya5EPdwJeUdow@mail.gmail.com> (raw)
In-Reply-To: 130299e01ddaee0d11a847cc4bf7d54997cd6dae.camel@intel.com

[-- Attachment #1: Type: text/plain, Size: 6283 bytes --]

On Fri, Apr 5, 2019, 20:10 Walker, Benjamin <benjamin.walker(a)intel.com>
wrote:

> On Fri, 2019-04-05 at 08:13 +0000, Stojaczyk, Dariusz wrote:
> > > -----Original Message-----
> > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris,
> James R
> > > Sent: Friday, April 5, 2019 1:35 AM
> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > Subject: Re: [SPDK] spdk_bdev_close() threading
> > >
> > > <SNIP>
> > >
> > >     I managed to push it into the bdev layer with barely 70 lines of
> code:
> > >     https://review.gerrithub.io/c/spdk/spdk/+/450112
> > >
> > > This doesn't handle the race I mentioned above.  Consider this case:
> > >
> > > Thread A opens a bdev.
> > > Thread B later decides it's going to close the bdev for some reason.
> Maybe
> > > the vhost session, nvmf subsystem, or iscsi target node is being
> deleted.
> > > There may be other reasons for user applications (outside of SPDK)
> using the
> > > bdev layer.  So thread B calls spdk_bdev_close().
> > > At the same time, that bdev is hot-removed.  Thread A will then
> callback to
> > > the user that the bdev was hot removed.  This thread calls
> > > spdk_bdev_close() to close the descriptor (which is a normal reaction
> to
> > > getting a hot remove notification).
> > >
> > > To prevent these two threads from racing to call spdk_bdev_close(), the
> > > caller needs to add locking or some other kind of synchronization.
> This is
> > > on
> > > top of the extra locking added in this patch.
> > >
> > > I guess I don't understand why this can't be handled in vhost.  Have
> vhost
> > > send a message to the thread that constructed the SCSI device when it
> wants
> > > to destruct it (which closes the underlying bdevs).  Then that thread
> > > doesn't
> > > have to worry about any races - if the bdev was hotremoved by time the
> > > message arrived, it just doesn't call spdk_bdev_close().
> >
> > I find the vhost logic very complicated already and would like to push
> the
> > extra complexity somewhere else if possible. In that extra vhost message
> > we would have to:
> > * check if the desc wasn't closed before
> > * check if the entire vhost device is still there - it could have been
> removed
> > e.g. on user request
> > * make sure we close the proper descriptor. The one that this message
> > was originally scheduled for might have been removed and a new descriptor
> > has been put in its place
> > * because of the above checks we need to lock anyway
> >
> > We don't have any mechanism in vhost to safely send a message to an
> > arbitrary thread. Most of the multi thread communication is done with
> > spdk_vhost_dev_foreach_session(), which can be called from any thread
> > and basically does:
> >  * execute a callback on each session's thread
> >  * lock the global mutex before each callback, so that it can safely
> access
> > the
> > global vhost device state (which is the usual case for foreach_session()
> - we
> > allow changing the device state from any thread locking the mutex, then
> > use foreach_session() to make each session retrieve the device changes).
> >  * keep the internal refcount of pending asynchronous vhost device
> > operations, so the vhost device can't be removed in the meantime
> >
> > It's not suitable for what you're asking.
>
> Note that the NVMe-oF target has a designated thread for each NVMe-oF
> subsystem,
> and all operations on that subsystem (including all bdev opens and closes)
> get
> funneled there. It then has additional infrastructure to send a message to
> every
> thread in the target to either update local data caches or to pause/resume
> I/O
> on those threads.
>
> NVMe-oF has some issues to work out with hotplug at the moment, but
> design-wise
> I think the threading model it uses is the right one and we need to move
> both
> vhost and iSCSI to a similar model. The way the NVMe-oF target works is
> also
> ideal for the upcoming lightweight threading changes.
>

It's rather surprising then that the above threading model hasn't been
applied to bdev layer where shared resources - such as bdev list - and
associated calls that modify it - such as register/unregister - are neither
protected, nor handled by the dedicated bdev manager thread. This gives me
a headache re how/if these resources will be managed safely in a
multithreaded environment.

Regards,
Andrey

>
> >
> > > I know there are some cases where we have to do locking in SPDK.  But
> I'd
> > > rather avoid adding it if possible.  This stuff is inherently
> difficult to
> > > really to
> > > test and ensure we get right.  This patch may only be about 120 lines
> of
> > > code,
> > > but we don't have any tests to make sure it's correct.
> >
> > I'm not sure if you saw them - this patch contains some unit tests for
> > spdk_bdev_close() called concurrently with spdk_bdev_unregister() on
> > a different thread. I stress-tested the real case scenario in vhost
> locally,
> > but
> > as for the CI, those unit tests are the best I can push right now.
> >
> > D.
> >
> > > -Jim
> > >
> > >
> > >     It doesn't affect any use cases which close the descriptor on the
> same
> > >     thread that opened it - it only lifts those single thread
> limitations.
> > >
> > >     D.
> > >
> > >     >
> > >     > -Jim
> > >     >
> > >     >
> > >     > _______________________________________________
> > >     > SPDK mailing list
> > >     > SPDK(a)lists.01.org
> > >     > https://lists.01.org/mailman/listinfo/spdk
> > >     _______________________________________________
> > >     SPDK mailing list
> > >     SPDK(a)lists.01.org
> > >     https://lists.01.org/mailman/listinfo/spdk
> > >
> > >
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>

             reply	other threads:[~2019-04-08 14:45 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-08 14:45 Andrey Kuzmin [this message]
  -- strict thread matches above, loose matches on Subject: below --
2019-09-28  2:06 [SPDK] spdk_bdev_close() threading wuzhouhui
2019-04-05 19:32 Stojaczyk, Dariusz
2019-04-05 17:10 Walker, Benjamin
2019-04-05  8:13 Stojaczyk, Dariusz
2019-04-04 23:35 Harris, James R
2019-04-04 22:09 Stojaczyk, Dariusz
2019-03-28 20:40 Harris, James R
2019-03-28 14:56 Stojaczyk, Dariusz
2019-03-28 14:55 Stojaczyk, Dariusz
2019-03-28 14:41 Harris, James R
2019-03-28 10:40 Andrey Kuzmin
2019-03-28  9:13 Stojaczyk, Dariusz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANvN+emhZk_mof466n_uvj2GPMKorhwmMg_Hyya5EPdwJeUdow@mail.gmail.com \
    --to=spdk@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.