From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============0210826090841579377==" MIME-Version: 1.0 From: Andrey Kuzmin Subject: Re: [SPDK] spdk_bdev_close() threading Date: Mon, 08 Apr 2019 17:45:36 +0300 Message-ID: In-Reply-To: 130299e01ddaee0d11a847cc4bf7d54997cd6dae.camel@intel.com List-ID: To: spdk@lists.01.org --===============0210826090841579377== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Fri, Apr 5, 2019, 20:10 Walker, Benjamin wrote: > On Fri, 2019-04-05 at 08:13 +0000, Stojaczyk, Dariusz wrote: > > > -----Original Message----- > > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, > James R > > > Sent: Friday, April 5, 2019 1:35 AM > > > To: Storage Performance Development Kit > > > Subject: Re: [SPDK] spdk_bdev_close() threading > > > > > > > > > > > > I managed to push it into the bdev layer with barely 70 lines of > code: > > > https://review.gerrithub.io/c/spdk/spdk/+/450112 > > > > > > This doesn't handle the race I mentioned above. Consider this case: > > > > > > Thread A opens a bdev. > > > Thread B later decides it's going to close the bdev for some reason. > Maybe > > > the vhost session, nvmf subsystem, or iscsi target node is being > deleted. > > > There may be other reasons for user applications (outside of SPDK) > using the > > > bdev layer. So thread B calls spdk_bdev_close(). > > > At the same time, that bdev is hot-removed. Thread A will then > callback to > > > the user that the bdev was hot removed. This thread calls > > > spdk_bdev_close() to close the descriptor (which is a normal reaction > to > > > getting a hot remove notification). > > > > > > To prevent these two threads from racing to call spdk_bdev_close(), t= he > > > caller needs to add locking or some other kind of synchronization. > This is > > > on > > > top of the extra locking added in this patch. > > > > > > I guess I don't understand why this can't be handled in vhost. Have > vhost > > > send a message to the thread that constructed the SCSI device when it > wants > > > to destruct it (which closes the underlying bdevs). Then that thread > > > doesn't > > > have to worry about any races - if the bdev was hotremoved by time the > > > message arrived, it just doesn't call spdk_bdev_close(). > > > > I find the vhost logic very complicated already and would like to push > the > > extra complexity somewhere else if possible. In that extra vhost message > > we would have to: > > * check if the desc wasn't closed before > > * check if the entire vhost device is still there - it could have been > removed > > e.g. on user request > > * make sure we close the proper descriptor. The one that this message > > was originally scheduled for might have been removed and a new descript= or > > has been put in its place > > * because of the above checks we need to lock anyway > > > > We don't have any mechanism in vhost to safely send a message to an > > arbitrary thread. Most of the multi thread communication is done with > > spdk_vhost_dev_foreach_session(), which can be called from any thread > > and basically does: > > * execute a callback on each session's thread > > * lock the global mutex before each callback, so that it can safely > access > > the > > global vhost device state (which is the usual case for foreach_session() > - we > > allow changing the device state from any thread locking the mutex, then > > use foreach_session() to make each session retrieve the device changes). > > * keep the internal refcount of pending asynchronous vhost device > > operations, so the vhost device can't be removed in the meantime > > > > It's not suitable for what you're asking. > > Note that the NVMe-oF target has a designated thread for each NVMe-oF > subsystem, > and all operations on that subsystem (including all bdev opens and closes) > get > funneled there. It then has additional infrastructure to send a message to > every > thread in the target to either update local data caches or to pause/resume > I/O > on those threads. > > NVMe-oF has some issues to work out with hotplug at the moment, but > design-wise > I think the threading model it uses is the right one and we need to move > both > vhost and iSCSI to a similar model. The way the NVMe-oF target works is > also > ideal for the upcoming lightweight threading changes. > It's rather surprising then that the above threading model hasn't been applied to bdev layer where shared resources - such as bdev list - and associated calls that modify it - such as register/unregister - are neither protected, nor handled by the dedicated bdev manager thread. This gives me a headache re how/if these resources will be managed safely in a multithreaded environment. Regards, Andrey > > > > > > I know there are some cases where we have to do locking in SPDK. But > I'd > > > rather avoid adding it if possible. This stuff is inherently > difficult to > > > really to > > > test and ensure we get right. This patch may only be about 120 lines > of > > > code, > > > but we don't have any tests to make sure it's correct. > > > > I'm not sure if you saw them - this patch contains some unit tests for > > spdk_bdev_close() called concurrently with spdk_bdev_unregister() on > > a different thread. I stress-tested the real case scenario in vhost > locally, > > but > > as for the CI, those unit tests are the best I can push right now. > > > > D. > > > > > -Jim > > > > > > > > > It doesn't affect any use cases which close the descriptor on the > same > > > thread that opened it - it only lifts those single thread > limitations. > > > > > > D. > > > > > > > > > > > -Jim > > > > > > > > > > > > _______________________________________________ > > > > SPDK mailing list > > > > SPDK(a)lists.01.org > > > > https://lists.01.org/mailman/listinfo/spdk > > > _______________________________________________ > > > SPDK mailing list > > > SPDK(a)lists.01.org > > > https://lists.01.org/mailman/listinfo/spdk > > > > > > > > > _______________________________________________ > > > SPDK mailing list > > > SPDK(a)lists.01.org > > > https://lists.01.org/mailman/listinfo/spdk > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk > --===============0210826090841579377==--