All of lore.kernel.org
 help / color / mirror / Atom feed
* [blk-mq Bug] race on removing hctx->dispatch_wait from wait queue
@ 2018-06-24 10:16 Ming Lei
  2018-06-24 16:33 ` Bart Van Assche
  2018-06-25 11:15 ` Ming Lei
  0 siblings, 2 replies; 8+ messages in thread
From: Ming Lei @ 2018-06-24 10:16 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, Christoph Hellwig, linux-block,
	Omar Sandoval, Andrew Jones
  Cc: Martin K. Petersen, linux-scsi

Hi Guys,

Recently, I am figuring out solutions for removing synchronize_rcu() from
blk_cleanup_queue() so that no long delay is caused during SCSI lun probe[1],
especially from blk_mq_del_queue_tag_set(). This synchronize_rcu() is added
by commit 705cda97ee3abb6ea(blk-mq: Make it safe to use RCU to iterate over
blk_mq_tag_set.tag_list), and commit 6d8c6c0f97ad ("blk-mq: Restart a single
queue if tag sets are shared"), and call this as 'TAG_SHARED in restart'.

Basically speaking, the synchronize_rcu() can't be removed if we have
to restart all tags-shared queues in current way('TAG_SHARED in restart')
when one request is completed. Meantime blk-mq has used blk_mq_mark_tag_wait()
to deal with cross-queue driver tag allocation, which means the two mechanism
are  highly overlapped. Also SCSI has built-in RESTART, and not need
'TAG_SHARED in restart'.

We tried to remove shared-tag restart in 358a3a6bccb7 (blk-mq: don't handle
TAG_SHARED in restart) before, but it is reverted in commit 05b79413946d
(Revert "blk-mq: don't handle TAG_SHARED in restart") because it causes
regression in Bart's SRP test.

Now I am revisiting 'TAG_SHARED in restart' again for the long delay issue
of SCSI LUN probe. And found there is one bug in blk_mq_mark_tag_wait():

- hctx->dispatch_wait is added to wait queue by holding hctx->lock and
the wait queue's lock

- no hctx->lock is held when removing hctx->dispatch_wait from wait
  queue.

- so the two actions(add, remove) may happen meantime since
  hctx->dispatch_wait can be added to different wait queues.

Turns out this issue can be observed easily by applying the patches[2],
which is for removing 'TAG_SHARED in restart', then run simple shared-tag
null_blk test[4].

But if the hctx->lock is held in blk_mq_dispatch_wake(), as done in
patch [3], there isn't such issue at all, so it shows this issue is
related with hctx->lock, and adding/removing hctx->dispatch_wait to
wait queue. But the way of holding hctx->lock in irq context may not
be one accepted solution, since it has been avoided from the beginning
of blk-mq.

So does anyone have better ideas for this issue?

So far, follows what I thought of:

1) fix the mechanism of blk_mq_mark_tag_wait(), and removing
'TAG_SHARED in restart', then we can fix the long delay issue of
SCSI LUN probe, meantime performance can got improved, as I observed,
this way may improve IOPS by 20~30% in multi-LUN scsi_debug test.
But the issue is how to fix?

2) keep 'TAG_SHARED in restart' and let it cover the issue of
blk_mq_mark_tag_wait() as now, then try to improve 'TAG_SHARED in restart'
in another way, so that performance can be better, and synchronize_rcu()
can be removed from blk_mq_del_queue_tag_set(), then SCSI LUN probe long
delay can be fixed. I had wrote patches to do that last year. If anyone
is interested, I may post it out.

Or other ideas, any comments & ideas are welcome!


[1] [PATCH] blk-mq: avoid to synchronize rcu inside blk_cleanup_queue() 
	https://marc.info/?t=152948737900041&r=1&w=2

[2] [PATCH] blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set() 
	https://github.com/ming1/linux/commit/36a0ff197531e02a955472059acfc436b8ed97e7

[3] [PATCH] blk-mq: holding hctx->lock when removing hctx->dispatch_wait from wai… 
	https://github.com/ming1/linux/commit/cb7c822d663552da62479942458444c5081149a1

[4] test script
	http://people.redhat.com/minlei/tests/tools/null_blk_test-restart

Thanks,
Ming

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [blk-mq Bug] race on removing hctx->dispatch_wait from wait queue
  2018-06-24 10:16 [blk-mq Bug] race on removing hctx->dispatch_wait from wait queue Ming Lei
@ 2018-06-24 16:33 ` Bart Van Assche
  2018-06-25  7:24   ` Ming Lei
  2018-06-25 11:15 ` Ming Lei
  1 sibling, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2018-06-24 16:33 UTC (permalink / raw)
  To: hch, drjones, linux-block, osandov, ming.lei, axboe
  Cc: linux-scsi, martin.petersen

T24gU3VuLCAyMDE4LTA2LTI0IGF0IDE4OjE2ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gTm93
IEkgYW0gcmV2aXNpdGluZyAnVEFHX1NIQVJFRCBpbiByZXN0YXJ0JyBhZ2FpbiBmb3IgdGhlIGxv
bmcgZGVsYXkgaXNzdWUNCj4gb2YgU0NTSSBMVU4gcHJvYmUuIEFuZCBmb3VuZCB0aGVyZSBpcyBv
bmUgYnVnIGluIGJsa19tcV9tYXJrX3RhZ193YWl0KCk6DQo+IA0KPiAtIGhjdHgtPmRpc3BhdGNo
X3dhaXQgaXMgYWRkZWQgdG8gd2FpdCBxdWV1ZSBieSBob2xkaW5nIGhjdHgtPmxvY2sgYW5kDQo+
IHRoZSB3YWl0IHF1ZXVlJ3MgbG9jaw0KPiANCj4gLSBubyBoY3R4LT5sb2NrIGlzIGhlbGQgd2hl
biByZW1vdmluZyBoY3R4LT5kaXNwYXRjaF93YWl0IGZyb20gd2FpdA0KPiAgIHF1ZXVlLg0KPiAN
Cj4gLSBzbyB0aGUgdHdvIGFjdGlvbnMoYWRkLCByZW1vdmUpIG1heSBoYXBwZW4gbWVhbnRpbWUg
c2luY2UNCj4gICBoY3R4LT5kaXNwYXRjaF93YWl0IGNhbiBiZSBhZGRlZCB0byBkaWZmZXJlbnQg
d2FpdCBxdWV1ZXMuDQo+IA0KPiBUdXJucyBvdXQgdGhpcyBpc3N1ZSBjYW4gYmUgb2JzZXJ2ZWQg
ZWFzaWx5IGJ5IGFwcGx5aW5nIHRoZSBwYXRjaGVzWzJdLA0KPiB3aGljaCBpcyBmb3IgcmVtb3Zp
bmcgJ1RBR19TSEFSRUQgaW4gcmVzdGFydCcsIHRoZW4gcnVuIHNpbXBsZSBzaGFyZWQtdGFnDQo+
IG51bGxfYmxrIHRlc3RbNF0uDQo+IA0KPiBCdXQgaWYgdGhlIGhjdHgtPmxvY2sgaXMgaGVsZCBp
biBibGtfbXFfZGlzcGF0Y2hfd2FrZSgpLCBhcyBkb25lIGluDQo+IHBhdGNoIFszXSwgdGhlcmUg
aXNuJ3Qgc3VjaCBpc3N1ZSBhdCBhbGwsIHNvIGl0IHNob3dzIHRoaXMgaXNzdWUgaXMNCj4gcmVs
YXRlZCB3aXRoIGhjdHgtPmxvY2ssIGFuZCBhZGRpbmcvcmVtb3ZpbmcgaGN0eC0+ZGlzcGF0Y2hf
d2FpdCB0bw0KPiB3YWl0IHF1ZXVlLiBCdXQgdGhlIHdheSBvZiBob2xkaW5nIGhjdHgtPmxvY2sg
aW4gaXJxIGNvbnRleHQgbWF5IG5vdA0KPiBiZSBvbmUgYWNjZXB0ZWQgc29sdXRpb24sIHNpbmNl
IGl0IGhhcyBiZWVuIGF2b2lkZWQgZnJvbSB0aGUgYmVnaW5uaW5nDQo+IG9mIGJsay1tcS4NCj4g
DQo+IFNvIGRvZXMgYW55b25lIGhhdmUgYmV0dGVyIGlkZWFzIGZvciB0aGlzIGlzc3VlPw0KPiAN
Cj4gU28gZmFyLCBmb2xsb3dzIHdoYXQgSSB0aG91Z2h0IG9mOg0KPiANCj4gMSkgZml4IHRoZSBt
ZWNoYW5pc20gb2YgYmxrX21xX21hcmtfdGFnX3dhaXQoKSwgYW5kIHJlbW92aW5nDQo+ICdUQUdf
U0hBUkVEIGluIHJlc3RhcnQnLCB0aGVuIHdlIGNhbiBmaXggdGhlIGxvbmcgZGVsYXkgaXNzdWUg
b2YNCj4gU0NTSSBMVU4gcHJvYmUsIG1lYW50aW1lIHBlcmZvcm1hbmNlIGNhbiBnb3QgaW1wcm92
ZWQsIGFzIEkgb2JzZXJ2ZWQsDQo+IHRoaXMgd2F5IG1heSBpbXByb3ZlIElPUFMgYnkgMjB+MzAl
IGluIG11bHRpLUxVTiBzY3NpX2RlYnVnIHRlc3QuDQo+IEJ1dCB0aGUgaXNzdWUgaXMgaG93IHRv
IGZpeD8NCj4gDQo+IDIpIGtlZXAgJ1RBR19TSEFSRUQgaW4gcmVzdGFydCcgYW5kIGxldCBpdCBj
b3ZlciB0aGUgaXNzdWUgb2YNCj4gYmxrX21xX21hcmtfdGFnX3dhaXQoKSBhcyBub3csIHRoZW4g
dHJ5IHRvIGltcHJvdmUgJ1RBR19TSEFSRUQgaW4gcmVzdGFydCcNCj4gaW4gYW5vdGhlciB3YXks
IHNvIHRoYXQgcGVyZm9ybWFuY2UgY2FuIGJlIGJldHRlciwgYW5kIHN5bmNocm9uaXplX3JjdSgp
DQo+IGNhbiBiZSByZW1vdmVkIGZyb20gYmxrX21xX2RlbF9xdWV1ZV90YWdfc2V0KCksIHRoZW4g
U0NTSSBMVU4gcHJvYmUgbG9uZw0KPiBkZWxheSBjYW4gYmUgZml4ZWQuIEkgaGFkIHdyb3RlIHBh
dGNoZXMgdG8gZG8gdGhhdCBsYXN0IHllYXIuIElmIGFueW9uZQ0KPiBpcyBpbnRlcmVzdGVkLCBJ
IG1heSBwb3N0IGl0IG91dC4NCj4gDQo+IE9yIG90aGVyIGlkZWFzLCBhbnkgY29tbWVudHMgJiBp
ZGVhcyBhcmUgd2VsY29tZSENCg0KUGxlYXNlIGhhdmUgYSBsb29rIGF0IFtQQVRDSF0gYmxrLW1x
OiBGaXggYSByYWNlIGNvbmRpdGlvbiBpbiBibGtfbXFfbWFya190YWdfd2FpdCgpLA0KMTYgSmFu
IDIwMTggKGh0dHBzOi8vd3d3Lm1haWwtYXJjaGl2ZS5jb20vbGludXgtYmxvY2tAdmdlci5rZXJu
ZWwub3JnL21zZzE3NDc0Lmh0bWwpLg0KDQpUaGFua3MsDQoNCkJhcnQuDQoNCg0KDQoNCg==

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [blk-mq Bug] race on removing hctx->dispatch_wait from wait queue
  2018-06-24 16:33 ` Bart Van Assche
@ 2018-06-25  7:24   ` Ming Lei
  2018-06-26 20:19     ` Bart Van Assche
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2018-06-25  7:24 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hch, drjones, linux-block, osandov, axboe, linux-scsi, martin.petersen

On Sun, Jun 24, 2018 at 04:33:21PM +0000, Bart Van Assche wrote:
> On Sun, 2018-06-24 at 18:16 +0800, Ming Lei wrote:
> > Now I am revisiting 'TAG_SHARED in restart' again for the long delay issue
> > of SCSI LUN probe. And found there is one bug in blk_mq_mark_tag_wait():
> > 
> > - hctx->dispatch_wait is added to wait queue by holding hctx->lock and
> > the wait queue's lock
> > 
> > - no hctx->lock is held when removing hctx->dispatch_wait from wait
> >   queue.
> > 
> > - so the two actions(add, remove) may happen meantime since
> >   hctx->dispatch_wait can be added to different wait queues.
> > 
> > Turns out this issue can be observed easily by applying the patches[2],
> > which is for removing 'TAG_SHARED in restart', then run simple shared-tag
> > null_blk test[4].
> > 
> > But if the hctx->lock is held in blk_mq_dispatch_wake(), as done in
> > patch [3], there isn't such issue at all, so it shows this issue is
> > related with hctx->lock, and adding/removing hctx->dispatch_wait to
> > wait queue. But the way of holding hctx->lock in irq context may not
> > be one accepted solution, since it has been avoided from the beginning
> > of blk-mq.
> > 
> > So does anyone have better ideas for this issue?
> > 
> > So far, follows what I thought of:
> > 
> > 1) fix the mechanism of blk_mq_mark_tag_wait(), and removing
> > 'TAG_SHARED in restart', then we can fix the long delay issue of
> > SCSI LUN probe, meantime performance can got improved, as I observed,
> > this way may improve IOPS by 20~30% in multi-LUN scsi_debug test.
> > But the issue is how to fix?
> > 
> > 2) keep 'TAG_SHARED in restart' and let it cover the issue of
> > blk_mq_mark_tag_wait() as now, then try to improve 'TAG_SHARED in restart'
> > in another way, so that performance can be better, and synchronize_rcu()
> > can be removed from blk_mq_del_queue_tag_set(), then SCSI LUN probe long
> > delay can be fixed. I had wrote patches to do that last year. If anyone
> > is interested, I may post it out.
> > 
> > Or other ideas, any comments & ideas are welcome!
> 
> Please have a look at [PATCH] blk-mq: Fix a race condition in blk_mq_mark_tag_wait(),
> 16 Jan 2018 (https://www.mail-archive.com/linux-block@vger.kernel.org/msg17474.html).

Thanks for sharing it, looks I miss your findings.

Your commit log describes the issue exactly, but unfortunately the patch
isn't correct, because hctx->lock isn't held in the removing path of
blk_mq_dispatch_wake(). Given 'hctx->dispatch_wait' may be added to
different wait queues, it isn't enough to hold wait queue lock and
hctx->lock in add path only. Otherwise, removing path can be seen as
'lockless' from the view point of add path.

If you apply the patch[1] and the patch of '[PATCH] blk-mq: Fix a race condition
in blk_mq_mark_tag_wait()', and run test script in [2], you will see
that IO hang can still be triggered easily.

I have cooked one patch of 'blk-mq: holding hctx->lock when removing
hctx->dispatch_wait from wai'[3], which can fix the issue, but need to
change all current pin_lock(hctx->lock) into spin_lock_irq() since
blk_mq_dispatch_wake() is usually done in irq context. This kind of
change might not be an accepted way, that is why I report it out
and start the discussion.


[1] https://github.com/ming1/linux/commit/36a0ff197531e02a955472059acfc436b8ed97e7
[2] http://people.redhat.com/minlei/tests/tools/null_blk_test-restart
[2] https://github.com/ming1/linux/commit/cb7c822d663552da62479942458444c5081149a1

Thanks,
Ming

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [blk-mq Bug] race on removing hctx->dispatch_wait from wait queue
  2018-06-24 10:16 [blk-mq Bug] race on removing hctx->dispatch_wait from wait queue Ming Lei
  2018-06-24 16:33 ` Bart Van Assche
@ 2018-06-25 11:15 ` Ming Lei
  1 sibling, 0 replies; 8+ messages in thread
From: Ming Lei @ 2018-06-25 11:15 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Bart Van Assche, Christoph Hellwig, linux-block,
	Omar Sandoval, Andrew Jones, Martin K. Petersen, Linux SCSI List

On Sun, Jun 24, 2018 at 6:16 PM, Ming Lei <ming.lei@redhat.com> wrote:
> Hi Guys,
>
> Recently, I am figuring out solutions for removing synchronize_rcu() from
> blk_cleanup_queue() so that no long delay is caused during SCSI lun probe[1],
> especially from blk_mq_del_queue_tag_set(). This synchronize_rcu() is added
> by commit 705cda97ee3abb6ea(blk-mq: Make it safe to use RCU to iterate over
> blk_mq_tag_set.tag_list), and commit 6d8c6c0f97ad ("blk-mq: Restart a single
> queue if tag sets are shared"), and call this as 'TAG_SHARED in restart'.
>
> Basically speaking, the synchronize_rcu() can't be removed if we have
> to restart all tags-shared queues in current way('TAG_SHARED in restart')
> when one request is completed. Meantime blk-mq has used blk_mq_mark_tag_wait()
> to deal with cross-queue driver tag allocation, which means the two mechanism
> are  highly overlapped. Also SCSI has built-in RESTART, and not need
> 'TAG_SHARED in restart'.
>
> We tried to remove shared-tag restart in 358a3a6bccb7 (blk-mq: don't handle
> TAG_SHARED in restart) before, but it is reverted in commit 05b79413946d
> (Revert "blk-mq: don't handle TAG_SHARED in restart") because it causes
> regression in Bart's SRP test.
>
> Now I am revisiting 'TAG_SHARED in restart' again for the long delay issue
> of SCSI LUN probe. And found there is one bug in blk_mq_mark_tag_wait():
>
> - hctx->dispatch_wait is added to wait queue by holding hctx->lock and
> the wait queue's lock
>
> - no hctx->lock is held when removing hctx->dispatch_wait from wait
>   queue.
>
> - so the two actions(add, remove) may happen meantime since
>   hctx->dispatch_wait can be added to different wait queues.
>
> Turns out this issue can be observed easily by applying the patches[2],
> which is for removing 'TAG_SHARED in restart', then run simple shared-tag
> null_blk test[4].
>
> But if the hctx->lock is held in blk_mq_dispatch_wake(), as done in
> patch [3], there isn't such issue at all, so it shows this issue is
> related with hctx->lock, and adding/removing hctx->dispatch_wait to
> wait queue. But the way of holding hctx->lock in irq context may not
> be one accepted solution, since it has been avoided from the beginning
> of blk-mq.
>
> So does anyone have better ideas for this issue?
>
> So far, follows what I thought of:
>
> 1) fix the mechanism of blk_mq_mark_tag_wait(), and removing
> 'TAG_SHARED in restart', then we can fix the long delay issue of
> SCSI LUN probe, meantime performance can got improved, as I observed,
> this way may improve IOPS by 20~30% in multi-LUN scsi_debug test.
> But the issue is how to fix?
>
> 2) keep 'TAG_SHARED in restart' and let it cover the issue of
> blk_mq_mark_tag_wait() as now, then try to improve 'TAG_SHARED in restart'
> in another way, so that performance can be better, and synchronize_rcu()
> can be removed from blk_mq_del_queue_tag_set(), then SCSI LUN probe long
> delay can be fixed. I had wrote patches to do that last year. If anyone
> is interested, I may post it out.
>
> Or other ideas, any comments & ideas are welcome!

Looks introducing one new lock to protect hctx->dispatch_wait is fine.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [blk-mq Bug] race on removing hctx->dispatch_wait from wait queue
  2018-06-25  7:24   ` Ming Lei
@ 2018-06-26 20:19     ` Bart Van Assche
  2018-06-27  0:49       ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2018-06-26 20:19 UTC (permalink / raw)
  To: Ming Lei
  Cc: hch, drjones, linux-block, osandov, axboe, linux-scsi, martin.petersen

On 06/25/18 00:25, Ming Lei wrote:
> On Sun, Jun 24, 2018 at 04:33:21PM +0000, Bart Van Assche wrote:
>> On Sun, 2018-06-24 at 18:16 +0800, Ming Lei wrote:
>>> Now I am revisiting 'TAG_SHARED in restart' again for the long delay issue
>>> of SCSI LUN probe. And found there is one bug in blk_mq_mark_tag_wait():
>>>
>>> - hctx->dispatch_wait is added to wait queue by holding hctx->lock and
>>> the wait queue's lock
>>>
>>> - no hctx->lock is held when removing hctx->dispatch_wait from wait
>>>    queue.
>>>
>>> - so the two actions(add, remove) may happen meantime since
>>>    hctx->dispatch_wait can be added to different wait queues.
>>>
>>> Turns out this issue can be observed easily by applying the patches[2],
>>> which is for removing 'TAG_SHARED in restart', then run simple shared-tag
>>> null_blk test[4].
>>>
>>> But if the hctx->lock is held in blk_mq_dispatch_wake(), as done in
>>> patch [3], there isn't such issue at all, so it shows this issue is
>>> related with hctx->lock, and adding/removing hctx->dispatch_wait to
>>> wait queue. But the way of holding hctx->lock in irq context may not
>>> be one accepted solution, since it has been avoided from the beginning
>>> of blk-mq.
>>>
>>> So does anyone have better ideas for this issue?
>>>
>>> So far, follows what I thought of:
>>>
>>> 1) fix the mechanism of blk_mq_mark_tag_wait(), and removing
>>> 'TAG_SHARED in restart', then we can fix the long delay issue of
>>> SCSI LUN probe, meantime performance can got improved, as I observed,
>>> this way may improve IOPS by 20~30% in multi-LUN scsi_debug test.
>>> But the issue is how to fix?
>>>
>>> 2) keep 'TAG_SHARED in restart' and let it cover the issue of
>>> blk_mq_mark_tag_wait() as now, then try to improve 'TAG_SHARED in restart'
>>> in another way, so that performance can be better, and synchronize_rcu()
>>> can be removed from blk_mq_del_queue_tag_set(), then SCSI LUN probe long
>>> delay can be fixed. I had wrote patches to do that last year. If anyone
>>> is interested, I may post it out.
>>>
>>> Or other ideas, any comments & ideas are welcome!
>>
>> Please have a look at [PATCH] blk-mq: Fix a race condition in blk_mq_mark_tag_wait(),
>> 16 Jan 2018 (https://www.mail-archive.com/linux-block@vger.kernel.org/msg17474.html).
> 
> Thanks for sharing it, looks I miss your findings.
> 
> Your commit log describes the issue exactly, but unfortunately the patch
> isn't correct, because hctx->lock isn't held in the removing path of
> blk_mq_dispatch_wake(). Given 'hctx->dispatch_wait' may be added to
> different wait queues, it isn't enough to hold wait queue lock and
> hctx->lock in add path only. Otherwise, removing path can be seen as
> 'lockless' from the view point of add path.

I disagree. My patch is such that the waitqueue lock is held both around 
the code that adds hctx->dispatch_wait to a waitqueue and around the 
code that removes hctx->dispatch_wait from a waitqueue. No locking has 
been added in blk_mq_dispatch_wake() because its caller holds the 
appropriate wait queue lock.

Bart.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [blk-mq Bug] race on removing hctx->dispatch_wait from wait queue
  2018-06-26 20:19     ` Bart Van Assche
@ 2018-06-27  0:49       ` Ming Lei
  2018-06-27 20:00         ` Bart Van Assche
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2018-06-27  0:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hch, drjones, linux-block, osandov, axboe, linux-scsi, martin.petersen

On Tue, Jun 26, 2018 at 01:19:45PM -0700, Bart Van Assche wrote:
> On 06/25/18 00:25, Ming Lei wrote:
> > On Sun, Jun 24, 2018 at 04:33:21PM +0000, Bart Van Assche wrote:
> > > On Sun, 2018-06-24 at 18:16 +0800, Ming Lei wrote:
> > > > Now I am revisiting 'TAG_SHARED in restart' again for the long delay issue
> > > > of SCSI LUN probe. And found there is one bug in blk_mq_mark_tag_wait():
> > > > 
> > > > - hctx->dispatch_wait is added to wait queue by holding hctx->lock and
> > > > the wait queue's lock
> > > > 
> > > > - no hctx->lock is held when removing hctx->dispatch_wait from wait
> > > >    queue.
> > > > 
> > > > - so the two actions(add, remove) may happen meantime since
> > > >    hctx->dispatch_wait can be added to different wait queues.
> > > > 
> > > > Turns out this issue can be observed easily by applying the patches[2],
> > > > which is for removing 'TAG_SHARED in restart', then run simple shared-tag
> > > > null_blk test[4].
> > > > 
> > > > But if the hctx->lock is held in blk_mq_dispatch_wake(), as done in
> > > > patch [3], there isn't such issue at all, so it shows this issue is
> > > > related with hctx->lock, and adding/removing hctx->dispatch_wait to
> > > > wait queue. But the way of holding hctx->lock in irq context may not
> > > > be one accepted solution, since it has been avoided from the beginning
> > > > of blk-mq.
> > > > 
> > > > So does anyone have better ideas for this issue?
> > > > 
> > > > So far, follows what I thought of:
> > > > 
> > > > 1) fix the mechanism of blk_mq_mark_tag_wait(), and removing
> > > > 'TAG_SHARED in restart', then we can fix the long delay issue of
> > > > SCSI LUN probe, meantime performance can got improved, as I observed,
> > > > this way may improve IOPS by 20~30% in multi-LUN scsi_debug test.
> > > > But the issue is how to fix?
> > > > 
> > > > 2) keep 'TAG_SHARED in restart' and let it cover the issue of
> > > > blk_mq_mark_tag_wait() as now, then try to improve 'TAG_SHARED in restart'
> > > > in another way, so that performance can be better, and synchronize_rcu()
> > > > can be removed from blk_mq_del_queue_tag_set(), then SCSI LUN probe long
> > > > delay can be fixed. I had wrote patches to do that last year. If anyone
> > > > is interested, I may post it out.
> > > > 
> > > > Or other ideas, any comments & ideas are welcome!
> > > 
> > > Please have a look at [PATCH] blk-mq: Fix a race condition in blk_mq_mark_tag_wait(),
> > > 16 Jan 2018 (https://www.mail-archive.com/linux-block@vger.kernel.org/msg17474.html).
> > 
> > Thanks for sharing it, looks I miss your findings.
> > 
> > Your commit log describes the issue exactly, but unfortunately the patch
> > isn't correct, because hctx->lock isn't held in the removing path of
> > blk_mq_dispatch_wake(). Given 'hctx->dispatch_wait' may be added to
> > different wait queues, it isn't enough to hold wait queue lock and
> > hctx->lock in add path only. Otherwise, removing path can be seen as
> > 'lockless' from the view point of add path.
> 
> I disagree. My patch is such that the waitqueue lock is held both around the
> code that adds hctx->dispatch_wait to a waitqueue and around the code that
> removes hctx->dispatch_wait from a waitqueue. No locking has been added in
> blk_mq_dispatch_wake() because its caller holds the appropriate wait queue
> lock.

Only the waitqueue's lock is held for blk_mq_dispatch_wake(), but this
hctx->dispatch_wait can be added to another waitqueue, so it isn't
enough to just hold waitqueue's lock.

If you run my test script mentioned, your patch isn't enough to fix
the IO hang issue.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [blk-mq Bug] race on removing hctx->dispatch_wait from wait queue
  2018-06-27  0:49       ` Ming Lei
@ 2018-06-27 20:00         ` Bart Van Assche
  2018-06-27 23:24           ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2018-06-27 20:00 UTC (permalink / raw)
  To: Ming Lei
  Cc: hch, drjones, linux-block, osandov, axboe, linux-scsi, martin.petersen

On 06/26/18 17:49, Ming Lei wrote:
> Only the waitqueue's lock is held for blk_mq_dispatch_wake(), but this
> hctx->dispatch_wait can be added to another waitqueue, so it isn't
> enough to just hold waitqueue's lock.

Hello Ming,

That makes sense to me. I will post an update that addresses this issue.

Bart.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [blk-mq Bug] race on removing hctx->dispatch_wait from wait queue
  2018-06-27 20:00         ` Bart Van Assche
@ 2018-06-27 23:24           ` Ming Lei
  0 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2018-06-27 23:24 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hch, drjones, linux-block, osandov, axboe, linux-scsi, martin.petersen

On Wed, Jun 27, 2018 at 01:00:08PM -0700, Bart Van Assche wrote:
> On 06/26/18 17:49, Ming Lei wrote:
> > Only the waitqueue's lock is held for blk_mq_dispatch_wake(), but this
> > hctx->dispatch_wait can be added to another waitqueue, so it isn't
> > enough to just hold waitqueue's lock.
> 
> Hello Ming,
> 
> That makes sense to me. I will post an update that addresses this issue.

I have sent a new patch days ago:

	https://marc.info/?l=linux-block&m=152998658713265&w=2

Thanks,
Ming

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-06-27 23:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-24 10:16 [blk-mq Bug] race on removing hctx->dispatch_wait from wait queue Ming Lei
2018-06-24 16:33 ` Bart Van Assche
2018-06-25  7:24   ` Ming Lei
2018-06-26 20:19     ` Bart Van Assche
2018-06-27  0:49       ` Ming Lei
2018-06-27 20:00         ` Bart Van Assche
2018-06-27 23:24           ` Ming Lei
2018-06-25 11:15 ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.