From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============0340445757184327484==" MIME-Version: 1.0 From: Walker, Benjamin Subject: Re: [SPDK] spdk_bdev_close() threading Date: Fri, 05 Apr 2019 17:10:45 +0000 Message-ID: <130299e01ddaee0d11a847cc4bf7d54997cd6dae.camel@intel.com> In-Reply-To: FBE7E039FA50BF47A673AD0BD3CD56A8462F9D1E@HASMSX105.ger.corp.intel.com List-ID: To: spdk@lists.01.org --===============0340445757184327484== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Fri, 2019-04-05 at 08:13 +0000, Stojaczyk, Dariusz wrote: > > -----Original Message----- > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Harris, Ja= mes R > > Sent: Friday, April 5, 2019 1:35 AM > > To: Storage Performance Development Kit > > Subject: Re: [SPDK] spdk_bdev_close() threading > > = > > > > = > > I managed to push it into the bdev layer with barely 70 lines of co= de: > > https://review.gerrithub.io/c/spdk/spdk/+/450112 > > = > > This doesn't handle the race I mentioned above. Consider this case: > > = > > Thread A opens a bdev. > > Thread B later decides it's going to close the bdev for some reason. M= aybe > > the vhost session, nvmf subsystem, or iscsi target node is being delete= d. > > There may be other reasons for user applications (outside of SPDK) usin= g the > > bdev layer. So thread B calls spdk_bdev_close(). > > At the same time, that bdev is hot-removed. Thread A will then callbac= k to > > the user that the bdev was hot removed. This thread calls > > spdk_bdev_close() to close the descriptor (which is a normal reaction to > > getting a hot remove notification). > > = > > To prevent these two threads from racing to call spdk_bdev_close(), the > > caller needs to add locking or some other kind of synchronization. Thi= s is > > on > > top of the extra locking added in this patch. > > = > > I guess I don't understand why this can't be handled in vhost. Have vh= ost > > send a message to the thread that constructed the SCSI device when it w= ants > > to destruct it (which closes the underlying bdevs). Then that thread > > doesn't > > have to worry about any races - if the bdev was hotremoved by time the > > message arrived, it just doesn't call spdk_bdev_close(). > = > I find the vhost logic very complicated already and would like to push the > extra complexity somewhere else if possible. In that extra vhost message > we would have to: > * check if the desc wasn't closed before > * check if the entire vhost device is still there - it could have been re= moved > e.g. on user request > * make sure we close the proper descriptor. The one that this message > was originally scheduled for might have been removed and a new descriptor > has been put in its place > * because of the above checks we need to lock anyway > = > We don't have any mechanism in vhost to safely send a message to an > arbitrary thread. Most of the multi thread communication is done with > spdk_vhost_dev_foreach_session(), which can be called from any thread > and basically does: > * execute a callback on each session's thread > * lock the global mutex before each callback, so that it can safely acce= ss > the > global vhost device state (which is the usual case for foreach_session() = - we > allow changing the device state from any thread locking the mutex, then > use foreach_session() to make each session retrieve the device changes). > * keep the internal refcount of pending asynchronous vhost device > operations, so the vhost device can't be removed in the meantime > = > It's not suitable for what you're asking. Note that the NVMe-oF target has a designated thread for each NVMe-oF subsy= stem, and all operations on that subsystem (including all bdev opens and closes) = get funneled there. It then has additional infrastructure to send a message to = every thread in the target to either update local data caches or to pause/resume = I/O on those threads. NVMe-oF has some issues to work out with hotplug at the moment, but design-= wise I think the threading model it uses is the right one and we need to move bo= th vhost and iSCSI to a similar model. The way the NVMe-oF target works is also ideal for the upcoming lightweight threading changes. > = > > I know there are some cases where we have to do locking in SPDK. But I= 'd > > rather avoid adding it if possible. This stuff is inherently difficult= to > > really to > > test and ensure we get right. This patch may only be about 120 lines of > > code, > > but we don't have any tests to make sure it's correct. > = > I'm not sure if you saw them - this patch contains some unit tests for > spdk_bdev_close() called concurrently with spdk_bdev_unregister() on > a different thread. I stress-tested the real case scenario in vhost local= ly, > but > as for the CI, those unit tests are the best I can push right now. > = > D. > = > > -Jim > > = > > = > > It doesn't affect any use cases which close the descriptor on the s= ame > > thread that opened it - it only lifts those single thread limitatio= ns. > > = > > D. > > = > > > > > > -Jim > > > > > > > > > _______________________________________________ > > > SPDK mailing list > > > SPDK(a)lists.01.org > > > https://lists.01.org/mailman/listinfo/spdk > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > > = > > = > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk --===============0340445757184327484==--