From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============7479509712979375627==" MIME-Version: 1.0 From: Stojaczyk, Dariusz Subject: Re: [SPDK] spdk_bdev_close() threading Date: Fri, 05 Apr 2019 08:13:40 +0000 Message-ID: In-Reply-To: 24E34F29-CFE1-453A-984E-ABBDB7B5CEAC@intel.com List-ID: To: spdk@lists.01.org --===============7479509712979375627== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable > -----Original Message----- > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, Jame= s R > Sent: Friday, April 5, 2019 1:35 AM > To: Storage Performance Development Kit > Subject: Re: [SPDK] spdk_bdev_close() threading > = > > = > I managed to push it into the bdev layer with barely 70 lines of code: > https://review.gerrithub.io/c/spdk/spdk/+/450112 > = > This doesn't handle the race I mentioned above. Consider this case: > = > Thread A opens a bdev. > Thread B later decides it's going to close the bdev for some reason. May= be > the vhost session, nvmf subsystem, or iscsi target node is being deleted. > There may be other reasons for user applications (outside of SPDK) using = the > bdev layer. So thread B calls spdk_bdev_close(). > At the same time, that bdev is hot-removed. Thread A will then callback = to > the user that the bdev was hot removed. This thread calls > spdk_bdev_close() to close the descriptor (which is a normal reaction to > getting a hot remove notification). > = > To prevent these two threads from racing to call spdk_bdev_close(), the > caller needs to add locking or some other kind of synchronization. This = is on > top of the extra locking added in this patch. > = > I guess I don't understand why this can't be handled in vhost. Have vhost > send a message to the thread that constructed the SCSI device when it wan= ts > to destruct it (which closes the underlying bdevs). Then that thread doe= sn't > have to worry about any races - if the bdev was hotremoved by time the > message arrived, it just doesn't call spdk_bdev_close(). I find the vhost logic very complicated already and would like to push the extra complexity somewhere else if possible. In that extra vhost message we would have to: * check if the desc wasn't closed before * check if the entire vhost device is still there - it could have been remo= ved e.g. on user request * make sure we close the proper descriptor. The one that this message was originally scheduled for might have been removed and a new descriptor has been put in its place * because of the above checks we need to lock anyway We don't have any mechanism in vhost to safely send a message to an arbitrary thread. Most of the multi thread communication is done with spdk_vhost_dev_foreach_session(), which can be called from any thread and basically does: * execute a callback on each session's thread * lock the global mutex before each callback, so that it can safely access= the global vhost device state (which is the usual case for foreach_session() - = we allow changing the device state from any thread locking the mutex, then use foreach_session() to make each session retrieve the device changes). * keep the internal refcount of pending asynchronous vhost device operations, so the vhost device can't be removed in the meantime It's not suitable for what you're asking. > = > I know there are some cases where we have to do locking in SPDK. But I'd > rather avoid adding it if possible. This stuff is inherently difficult t= o really to > test and ensure we get right. This patch may only be about 120 lines of = code, > but we don't have any tests to make sure it's correct. I'm not sure if you saw them - this patch contains some unit tests for spdk_bdev_close() called concurrently with spdk_bdev_unregister() on a different thread. I stress-tested the real case scenario in vhost locally= , but as for the CI, those unit tests are the best I can push right now. D. > = > -Jim > = > = > It doesn't affect any use cases which close the descriptor on the same > thread that opened it - it only lifts those single thread limitations. > = > D. > = > > > > -Jim > > > > > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk > = > = > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk --===============7479509712979375627==--