All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue
@ 2017-11-05 12:10 Ming Lei
  2017-11-05 15:38 ` Bart Van Assche
  2017-11-10  6:17 ` Ming Lei
  0 siblings, 2 replies; 9+ messages in thread
From: Ming Lei @ 2017-11-05 12:10 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, Ming Lei, stable

blk-mq never respects queue dead, and this may cause use-after-free on
any kind of queue resources. This patch respects the rule by calling
blk_mq_quiesce_queue() when queue is marked as DEAD.

This patch fixes the following kernel crash:

[   42.170824] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   42.172011] IP: blk_mq_flush_busy_ctxs+0x5a/0xe0
[   42.172011] PGD 25d8d1067 P4D 25d8d1067 PUD 25d458067 PMD 0
[   42.172011] Oops: 0000 [#1] PREEMPT SMP
[   42.172011] Dumping ftrace buffer:
[   42.172011]    (ftrace buffer empty)
[   42.172011] Modules linked in: scsi_debug ebtable_filter ebtables ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c bridge stp llc fuse iptable_filter ip_tables sd_mod sg mptsas mptscsih mptbase crc32c_intel scsi_transport_sas nvme lpc_ich serio_raw ahci virtio_scsi libahci libata nvme_core binfmt_misc dm_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi null_blk configs
[   42.172011] CPU: 0 PID: 2750 Comm: fio Not tainted 4.14.0-rc7.blk_mq_io_hang+ #507
[   42.172011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.9.3-1.fc25 04/01/2014
[   42.172011] task: ffff88025dc18000 task.stack: ffffc900028c4000
[   42.172011] RIP: 0010:blk_mq_flush_busy_ctxs+0x5a/0xe0
[   42.172011] RSP: 0018:ffffc900028c7be0 EFLAGS: 00010246
[   42.172011] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8802775907c8
[   42.172011] RDX: 0000000000000000 RSI: ffffc900028c7c50 RDI: ffff88025de25c00
[   42.172011] RBP: ffffc900028c7c28 R08: 0000000000000000 R09: ffff8802595692c0
[   42.172011] R10: ffffc900028c7e50 R11: 0000000000000008 R12: ffffc900028c7c50
[   42.172011] R13: ffff88025de25cd8 R14: ffffc900028c7d78 R15: ffff88025de25c00
[   42.172011] FS:  00007faa8653f7c0(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
[   42.172011] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   42.172011] CR2: 0000000000000000 CR3: 00000002593df006 CR4: 00000000003606f0
[   42.172011] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   42.172011] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   42.172011] Call Trace:
[   42.172011]  blk_mq_sched_dispatch_requests+0x1b4/0x1f0
[   42.172011]  ? preempt_schedule+0x27/0x30
[   42.172011]  __blk_mq_run_hw_queue+0x8b/0xa0
[   42.172011]  __blk_mq_delay_run_hw_queue+0xb7/0x100
[   42.172011]  blk_mq_run_hw_queue+0x14/0x20
[   42.172011]  blk_mq_sched_insert_requests+0xda/0x120
[   42.172011]  blk_mq_flush_plug_list+0x179/0x280
[   42.172011]  blk_flush_plug_list+0x102/0x290
[   42.172011]  blk_finish_plug+0x2c/0x40
[   42.172011]  do_io_submit+0x3fa/0x780
[   42.172011]  SyS_io_submit+0x10/0x20
[   42.172011]  ? SyS_io_submit+0x10/0x20
[   42.172011]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[   42.172011] RIP: 0033:0x7faa80ff8697
[   42.172011] RSP: 002b:00007ffe08236de8 EFLAGS: 00000206 ORIG_RAX: 00000000000000d1
[   42.172011] RAX: ffffffffffffffda RBX: 0000000001f1e600 RCX: 00007faa80ff8697
[   42.172011] RDX: 000000000208e638 RSI: 0000000000000001 RDI: 00007faa6ecae000
[   42.172011] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000001f1e0e0
[   42.172011] R10: 0000000000000001 R11: 0000000000000206 R12: 00007faa614f19e0
[   42.172011] R13: 00007faa614ff0b0 R14: 00007faa614fef50 R15: 0000000100000000
[   42.172011] Code: e0 00 00 00 c7 45 bc 00 00 00 00 85 c0 74 76 4c 8d af d8 00 00 00 49 89 ff 44 8b 45 bc 4c 89 c0 49 c1 e0 06 4d 03 87 e8 00 00 00 <49> 83 38 00 4d 89 c6 74 41 41 8b 8f dc 00 00 00 41 89 c4 31 db
[   42.172011] RIP: blk_mq_flush_busy_ctxs+0x5a/0xe0 RSP: ffffc900028c7be0
[   42.172011] CR2: 0000000000000000
[   42.172011] ---[ end trace 2432dfddf9b84061 ]---
[   42.172011] Kernel panic - not syncing: Fatal exception
[   42.172011] Dumping ftrace buffer:
[   42.172011]    (ftrace buffer empty)
[   42.172011] Kernel Offset: disabled
[   42.172011] ---[ end Kernel panic - not syncing: Fatal exception

Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 048be4aa6024..0b121f29e3b1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -658,6 +658,10 @@ void blk_cleanup_queue(struct request_queue *q)
 	queue_flag_set(QUEUE_FLAG_DEAD, q);
 	spin_unlock_irq(lock);
 
+	/* respect queue DEAD via quiesce for blk-mq */
+	if (q->mq_ops)
+		blk_mq_quiesce_queue(q);
+
 	/* for synchronous bio-based driver finish in-flight integrity i/o */
 	blk_flush_integrity();
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue
  2017-11-05 12:10 [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue Ming Lei
@ 2017-11-05 15:38 ` Bart Van Assche
  2017-11-06  3:44   ` Ming Lei
  2017-11-10 16:30   ` Bart Van Assche
  2017-11-10  6:17 ` Ming Lei
  1 sibling, 2 replies; 9+ messages in thread
From: Bart Van Assche @ 2017-11-05 15:38 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei; +Cc: osandov, hare

T24gU3VuLCAyMDE3LTExLTA1IGF0IDIwOjEwICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gZGlm
ZiAtLWdpdCBhL2Jsb2NrL2Jsay1jb3JlLmMgYi9ibG9jay9ibGstY29yZS5jDQo+IGluZGV4IDA0
OGJlNGFhNjAyNC4uMGIxMjFmMjllM2IxIDEwMDY0NA0KPiAtLS0gYS9ibG9jay9ibGstY29yZS5j
DQo+ICsrKyBiL2Jsb2NrL2Jsay1jb3JlLmMNCj4gQEAgLTY1OCw2ICs2NTgsMTAgQEAgdm9pZCBi
bGtfY2xlYW51cF9xdWV1ZShzdHJ1Y3QgcmVxdWVzdF9xdWV1ZSAqcSkNCj4gIAlxdWV1ZV9mbGFn
X3NldChRVUVVRV9GTEFHX0RFQUQsIHEpOw0KPiAgCXNwaW5fdW5sb2NrX2lycShsb2NrKTsNCj4g
IA0KPiArCS8qIHJlc3BlY3QgcXVldWUgREVBRCB2aWEgcXVpZXNjZSBmb3IgYmxrLW1xICovDQo+
ICsJaWYgKHEtPm1xX29wcykNCj4gKwkJYmxrX21xX3F1aWVzY2VfcXVldWUocSk7DQo+ICsNCj4g
IAkvKiBmb3Igc3luY2hyb25vdXMgYmlvLWJhc2VkIGRyaXZlciBmaW5pc2ggaW4tZmxpZ2h0IGlu
dGVncml0eSBpL28gKi8NCj4gIAlibGtfZmx1c2hfaW50ZWdyaXR5KCk7DQoNCkhhdmUgeW91IGNv
bnNpZGVyZWQgdG8gY2hhbmdlIHRoZSBibGtfZnJlZXplX3F1ZXVlX3N0YXJ0KCkgY2FsbCBpbg0K
YmxrX3NldF9xdWV1ZV9keWluZygpIGludG8gYSBibGtfZnJlZXplX3F1ZXVlKCkgY2FsbD8gVGhh
dCBhcHByb2FjaCBoYXMgdGhlDQphZHZhbnRhZ2UgdGhhdCBubyBuZXcgaWYgKHEtPm1xX29wcykg
dGVzdCBoYXMgdG8gYmUgaW50cm9kdWNlZC4NCg0KQWRkaXRpb25hbGx5LCB0aGUgY2FsbCB0cmFj
ZSBpbiB0aGUgZGVzY3JpcHRpb24gb2YgdGhpcyBwYXRjaCBzaG93cyB0aGF0IHRoZQ0KY29tbWVu
dCBpbiBibGtfZXhlY3V0ZV9ycV9ub3dhaXQoKSBpcyB3cm9uZy4gSG93IGFib3V0IGNoYW5naW5n
IHRoYXQgY29tbWVudA0KYXMgZm9sbG93cz8NCg0KQEAgLTU3LDEwICs1NywxMiBAQCB2b2lkIGJs
a19leGVjdXRlX3JxX25vd2FpdChzdHJ1Y3QgcmVxdWVzdF9xdWV1ZSAqcSwgc3RydWN0IGdlbmRp
c2sgKmJkX2Rpc2ssDQogCXJxLT5lbmRfaW8gPSBkb25lOw0KIA0KIAkvKg0KLQkgKiBkb24ndCBj
aGVjayBkeWluZyBmbGFnIGZvciBNUSBiZWNhdXNlIHRoZSByZXF1ZXN0IHdvbid0DQotCSAqIGJl
IHJldXNlZCBhZnRlciBkeWluZyBmbGFnIGlzIHNldA0KKwkgKiBibGtfZnJlZXplX3F1ZXVlKCkg
bXVzdCBiZSBjYWxsZWQgYmVmb3JlIHRyYW5zaXRpb25pbmcgYSBxdWV1ZQ0KKwkgKiBpbnRvIHRo
ZSAiZGVhZCIgc3RhdGUgdG8gZ3VhcmFudGVlIHRoYXQgYmxrX2V4ZWN1dGVfcnFfbm93YWl0KCkN
CisJICogd29uJ3QgYXR0ZW1wdCB0byBxdWV1ZSBhIHJlcXVlc3Qgb24gYSAiZGVhZCIgYmxrLW1x
IHF1ZXVlLg0KIAkgKi8NCiAJaWYgKHEtPm1xX29wcykgew0KKwkJV0FSTl9PTl9PTkNFKGJsa19x
dWV1ZV9kZWFkKHEpKTsNCiAJCWJsa19tcV9zY2hlZF9pbnNlcnRfcmVxdWVzdChycSwgYXRfaGVh
ZCwgdHJ1ZSwgZmFsc2UsIGZhbHNlKTsNCiAJCXJldHVybjsNCiAJfQ0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue
  2017-11-05 15:38 ` Bart Van Assche
@ 2017-11-06  3:44   ` Ming Lei
  2017-11-06 16:34     ` Bart Van Assche
  2017-11-10 16:30   ` Bart Van Assche
  1 sibling, 1 reply; 9+ messages in thread
From: Ming Lei @ 2017-11-06  3:44 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe, osandov, hare

On Sun, Nov 05, 2017 at 03:38:49PM +0000, Bart Van Assche wrote:
> On Sun, 2017-11-05 at 20:10 +0800, Ming Lei wrote:
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 048be4aa6024..0b121f29e3b1 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -658,6 +658,10 @@ void blk_cleanup_queue(struct request_queue *q)
> >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> >  	spin_unlock_irq(lock);
> >  
> > +	/* respect queue DEAD via quiesce for blk-mq */
> > +	if (q->mq_ops)
> > +		blk_mq_quiesce_queue(q);
> > +
> >  	/* for synchronous bio-based driver finish in-flight integrity i/o */
> >  	blk_flush_integrity();
> 
> Have you considered to change the blk_freeze_queue_start() call in
> blk_set_queue_dying() into a blk_freeze_queue() call? That approach has the
> advantage that no new if (q->mq_ops) test has to be introduced.

That approach isn't nothing to do with this issue, and can't fix this issue
too. Not mention we hold q->sysfs_lock before calling blk_set_queue_dying(),
there may be risk to cause deadlock.

The issue is that there isn't any request in queue(queue is frozen), but
dispatch still may happen, let me explain it a bit:

1) there are several IO submit paths in-progress
2) requests from all these paths are inserted to queue, but may dispatch to
LLD in only one of these paths, but other paths may still move on to dispatch
even all these requests are completed(that means blk_mq_freeze_queue_wait()
returns at that time)
3) the dispatch after queue dead happens and causes the use-after-free,
because we never respect queue dead for blk-mq.

That is exactly what QUEUE_DEAD supposes to protect, and we can let quiesce
respect QUEUE_DEAD perfectly and easily.

> 
> Additionally, the call trace in the description of this patch shows that the
> comment in blk_execute_rq_nowait() is wrong. How about changing that comment
> as follows?
> 
> @@ -57,10 +57,12 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
>  	rq->end_io = done;
>  
>  	/*
> -	 * don't check dying flag for MQ because the request won't
> -	 * be reused after dying flag is set
> +	 * blk_freeze_queue() must be called before transitioning a queue
> +	 * into the "dead" state to guarantee that blk_execute_rq_nowait()
> +	 * won't attempt to queue a request on a "dead" blk-mq queue.

blk_freeze_queue() can't cover queue dead as I explained above, and it
returns just when there is no request in queue, but dispatch may be in-progress.

>  	 */
>  	if (q->mq_ops) {
> +		WARN_ON_ONCE(blk_queue_dead(q));
>  		blk_mq_sched_insert_request(rq, at_head, true, false, false);
>  		return;
>  	}

No, this WARN_ON() can never be triggered, because the request isn't
completed yet.

-- 
Ming

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue
  2017-11-06  3:44   ` Ming Lei
@ 2017-11-06 16:34     ` Bart Van Assche
  2017-11-07  2:27       ` Ming Lei
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2017-11-06 16:34 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, osandov, axboe, hare

T24gTW9uLCAyMDE3LTExLTA2IGF0IDExOjQ0ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
U3VuLCBOb3YgMDUsIDIwMTcgYXQgMDM6Mzg6NDlQTSArMDAwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+IE9uIFN1biwgMjAxNy0xMS0wNSBhdCAyMDoxMCArMDgwMCwgTWluZyBMZWkgd3Jv
dGU6DQo+ID4gPiBkaWZmIC0tZ2l0IGEvYmxvY2svYmxrLWNvcmUuYyBiL2Jsb2NrL2Jsay1jb3Jl
LmMNCj4gPiA+IGluZGV4IDA0OGJlNGFhNjAyNC4uMGIxMjFmMjllM2IxIDEwMDY0NA0KPiA+ID4g
LS0tIGEvYmxvY2svYmxrLWNvcmUuYw0KPiA+ID4gKysrIGIvYmxvY2svYmxrLWNvcmUuYw0KPiA+
ID4gQEAgLTY1OCw2ICs2NTgsMTAgQEAgdm9pZCBibGtfY2xlYW51cF9xdWV1ZShzdHJ1Y3QgcmVx
dWVzdF9xdWV1ZSAqcSkNCj4gPiA+ICAJcXVldWVfZmxhZ19zZXQoUVVFVUVfRkxBR19ERUFELCBx
KTsNCj4gPiA+ICAJc3Bpbl91bmxvY2tfaXJxKGxvY2spOw0KPiA+ID4gIA0KPiA+ID4gKwkvKiBy
ZXNwZWN0IHF1ZXVlIERFQUQgdmlhIHF1aWVzY2UgZm9yIGJsay1tcSAqLw0KPiA+ID4gKwlpZiAo
cS0+bXFfb3BzKQ0KPiA+ID4gKwkJYmxrX21xX3F1aWVzY2VfcXVldWUocSk7DQo+ID4gPiArDQo+
ID4gPiAgCS8qIGZvciBzeW5jaHJvbm91cyBiaW8tYmFzZWQgZHJpdmVyIGZpbmlzaCBpbi1mbGln
aHQgaW50ZWdyaXR5IGkvbyAqLw0KPiA+ID4gIAlibGtfZmx1c2hfaW50ZWdyaXR5KCk7DQo+ID4g
DQo+ID4gSGF2ZSB5b3UgY29uc2lkZXJlZCB0byBjaGFuZ2UgdGhlIGJsa19mcmVlemVfcXVldWVf
c3RhcnQoKSBjYWxsIGluDQo+ID4gYmxrX3NldF9xdWV1ZV9keWluZygpIGludG8gYSBibGtfZnJl
ZXplX3F1ZXVlKCkgY2FsbD8gVGhhdCBhcHByb2FjaCBoYXMgdGhlDQo+ID4gYWR2YW50YWdlIHRo
YXQgbm8gbmV3IGlmIChxLT5tcV9vcHMpIHRlc3QgaGFzIHRvIGJlIGludHJvZHVjZWQuDQo+IA0K
PiBUaGF0IGFwcHJvYWNoIGlzbid0IG5vdGhpbmcgdG8gZG8gd2l0aCB0aGlzIGlzc3VlLCBhbmQg
Y2FuJ3QgZml4IHRoaXMgaXNzdWUNCj4gdG9vLg0KDQpTb3JyeSBidXQgSSBkaXNhZ3JlZS4gTXkg
b3BpbmlvbiBpcyB0aGF0IHRoZSBjaGFuZ2UgSSBwcm9wb3NlZCBpcyBhIG1vcmUNCmVsZWdhbnQg
d2F5IHRvIGF2b2lkIHRoYXQgYmxrX21xX3J1bl9od19xdWV1ZSgpIHRyaWVzIHRvIGV4ZWN1dGUg
YSByZXF1ZXN0DQphZnRlciBibGtfY2xlYW51cF9xdWV1ZSgpIGhhcyBzdGFydGVkLg0KDQo+IE5v
dCBtZW50aW9uIHdlIGhvbGQgcS0+c3lzZnNfbG9jayBiZWZvcmUgY2FsbGluZyBibGtfc2V0X3F1
ZXVlX2R5aW5nKCksDQo+IHRoZXJlIG1heSBiZSByaXNrIHRvIGNhdXNlIGRlYWRsb2NrLg0KDQpT
b3JyeSBidXQgSSBkaXNhZ3JlZSBhZ2Fpbi4gcS0+c3lzZnNfbG9jayBpcyBub3Qgb2J0YWluZWQg
ZnJvbSBpbnNpZGUgYW55DQpjb2RlIHRoYXQgcnVucyB0aGUgcXVldWUgc28gdGhlcmUgaXMgbm8g
ZGVhZGxvY2sgcmlzay4gQWRkaXRpb25hbGx5LCBJIGRvdWJ0DQp0aGF0IGl0IGlzIHVzZWZ1bCB0
byBvYnRhaW4gcS0+c3lzZnNfbG9jayBmcm9tIGluc2lkZSBibGtfY2xlYW51cF9xdWV1ZSgpLg0K
VGhlIHF1ZXVlIGF0dHJpYnV0ZXMgdGhhdCBhcmUgZXhwb3J0ZWQgdGhyb3VnaCBzeXNmcyBjYW4g
YmUgbW9kaWZpZWQgc2FmZWx5DQp1bnRpbCB0aGVzZSBzeXNmcyBhdHRyaWJ1dGVzIGFyZSByZW1v
dmVkLg0KDQo+IFRoZSBpc3N1ZSBpcyB0aGF0IHRoZXJlIGlzbid0IGFueSByZXF1ZXN0IGluIHF1
ZXVlKHF1ZXVlIGlzIGZyb3plbiksIGJ1dA0KPiBkaXNwYXRjaCBzdGlsbCBtYXkgaGFwcGVuLCBs
ZXQgbWUgZXhwbGFpbiBpdCBhIGJpdDoNCj4gDQo+IDEpIHRoZXJlIGFyZSBzZXZlcmFsIElPIHN1
Ym1pdCBwYXRocyBpbi1wcm9ncmVzcw0KPiAyKSByZXF1ZXN0cyBmcm9tIGFsbCB0aGVzZSBwYXRo
cyBhcmUgaW5zZXJ0ZWQgdG8gcXVldWUsIGJ1dCBtYXkgZGlzcGF0Y2ggdG8NCj4gTExEIGluIG9u
bHkgb25lIG9mIHRoZXNlIHBhdGhzLCBidXQgb3RoZXIgcGF0aHMgbWF5IHN0aWxsIG1vdmUgb24g
dG8gZGlzcGF0Y2gNCj4gZXZlbiBhbGwgdGhlc2UgcmVxdWVzdHMgYXJlIGNvbXBsZXRlZCh0aGF0
IG1lYW5zIGJsa19tcV9mcmVlemVfcXVldWVfd2FpdCgpDQo+IHJldHVybnMgYXQgdGhhdCB0aW1l
KQ0KPiAzKSB0aGUgZGlzcGF0Y2ggYWZ0ZXIgcXVldWUgZGVhZCBoYXBwZW5zIGFuZCBjYXVzZXMg
dGhlIHVzZS1hZnRlci1mcmVlLA0KPiBiZWNhdXNlIHdlIG5ldmVyIHJlc3BlY3QgcXVldWUgZGVh
ZCBmb3IgYmxrLW1xLg0KDQpBZnRlciBhIHF1ZXVlIGhhcyBiZWVuIG1hcmtlZCAiZHlpbmciIGFu
ZCBhZnRlciBibGtfZnJlZXplX3F1ZXVlKCkgaGFzDQpyZXR1cm5lZCBpdCBpcyBndWFyYW50ZWVk
IHRoYXQgYWxsIHJlcXVlc3RzIGhhdmUgZmluaXNoZWQgYW5kIHRoYXQgYWxsIGZ1dHVyZQ0KYmxr
X2dldF9yZXF1ZXN0KCkgY2FsbHMgd2lsbCBmYWlsLiBIZW5jZSAucXVldWVfcnEoKSB3b24ndCBi
ZSBjYWxsZWQgYW55bW9yZS4NCkFueSBjb2RlIHRoYXQgcnVucyBhIHF1ZXVlICh0aGUgX19ibGtf
bXFfcnVuX2h3X3F1ZXVlKCkgLyBibGtfbXFfbWFrZV9yZXF1ZXN0KCkNCmNhbGxlcnMpIG11c3Qg
aG9sZCBhIHJlZmVyZW5jZSBvbiB0aGF0IHF1ZXVlLiBSdW5uaW5nIGEgcXVldWUgZXZlbiBhZnRl
cg0KYmxrX2NsZWFudXBfcXVldWUoKSBmaW5pc2hlZCBpcyBzYWZlIGFzIGxvbmcgYXMgdGhlIGNh
bGxlciBob2xkcyBhIHJlZmVyZW5jZQ0Kb24gdGhhdCBxdWV1ZS4gU28gdGhlIHNjZW5hcmlvIGRl
c2NyaWJlZCBhYm92ZSB3b24ndCBsZWFkIHRvIGEgY3Jhc2guDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue
  2017-11-06 16:34     ` Bart Van Assche
@ 2017-11-07  2:27       ` Ming Lei
  2017-11-08  2:10         ` Ming Lei
  0 siblings, 1 reply; 9+ messages in thread
From: Ming Lei @ 2017-11-07  2:27 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, osandov, axboe, hare

On Mon, Nov 06, 2017 at 04:34:21PM +0000, Bart Van Assche wrote:
> On Mon, 2017-11-06 at 11:44 +0800, Ming Lei wrote:
> > On Sun, Nov 05, 2017 at 03:38:49PM +0000, Bart Van Assche wrote:
> > > On Sun, 2017-11-05 at 20:10 +0800, Ming Lei wrote:
> > > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > > index 048be4aa6024..0b121f29e3b1 100644
> > > > --- a/block/blk-core.c
> > > > +++ b/block/blk-core.c
> > > > @@ -658,6 +658,10 @@ void blk_cleanup_queue(struct request_queue *q)
> > > >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> > > >  	spin_unlock_irq(lock);
> > > >  
> > > > +	/* respect queue DEAD via quiesce for blk-mq */
> > > > +	if (q->mq_ops)
> > > > +		blk_mq_quiesce_queue(q);
> > > > +
> > > >  	/* for synchronous bio-based driver finish in-flight integrity i/o */
> > > >  	blk_flush_integrity();
> > > 
> > > Have you considered to change the blk_freeze_queue_start() call in
> > > blk_set_queue_dying() into a blk_freeze_queue() call? That approach has the
> > > advantage that no new if (q->mq_ops) test has to be introduced.
> > 
> > That approach isn't nothing to do with this issue, and can't fix this issue
> > too.
> 
> Sorry but I disagree. My opinion is that the change I proposed is a more
> elegant way to avoid that blk_mq_run_hw_queue() tries to execute a request
> after blk_cleanup_queue() has started.
> 
> > Not mention we hold q->sysfs_lock before calling blk_set_queue_dying(),
> > there may be risk to cause deadlock.
> 
> Sorry but I disagree again. q->sysfs_lock is not obtained from inside any
> code that runs the queue so there is no deadlock risk. Additionally, I doubt
> that it is useful to obtain q->sysfs_lock from inside blk_cleanup_queue().
> The queue attributes that are exported through sysfs can be modified safely
> until these sysfs attributes are removed.
> 
> > The issue is that there isn't any request in queue(queue is frozen), but
> > dispatch still may happen, let me explain it a bit:
> > 
> > 1) there are several IO submit paths in-progress
> > 2) requests from all these paths are inserted to queue, but may dispatch to
> > LLD in only one of these paths, but other paths may still move on to dispatch
> > even all these requests are completed(that means blk_mq_freeze_queue_wait()
> > returns at that time)
> > 3) the dispatch after queue dead happens and causes the use-after-free,
> > because we never respect queue dead for blk-mq.
> 
> After a queue has been marked "dying" and after blk_freeze_queue() has
> returned it is guaranteed that all requests have finished and that all future
> blk_get_request() calls will fail. Hence .queue_rq() won't be called anymore.
> Any code that runs a queue (the __blk_mq_run_hw_queue() / blk_mq_make_request()
> callers) must hold a reference on that queue. Running a queue even after

The reference is released via blk_queue_exit() when the request is freed in
blk_mq_free_request(), but the dispatch may still be in-progress after the request
is freed. I suggest you to take a look at the following words carefully,
and if you think something is wrong, please comment on it directly:

	1) there are several IO submit paths in-progress

	2) requests from all these paths are inserted to queue, but may dispatch to
	LLD in only one of these paths, but other paths may still move on to dispatch
	even all these requests are completed(that means blk_mq_freeze_queue_wait()
	returns at that time)

	3) the dispatch after queue dead happens and causes the use-after-free,
	because we never respect queue dead for blk-mq.

blk_freeze_queue() can make sure that no .queue_rq() can be run, but
can't make sure the other dispatch part(like dequeue, check if there is
work in queue, ...) is finished.

-- 
Ming

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue
  2017-11-07  2:27       ` Ming Lei
@ 2017-11-08  2:10         ` Ming Lei
  0 siblings, 0 replies; 9+ messages in thread
From: Ming Lei @ 2017-11-08  2:10 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, osandov, axboe, hare

On Tue, Nov 07, 2017 at 10:27:11AM +0800, Ming Lei wrote:
> On Mon, Nov 06, 2017 at 04:34:21PM +0000, Bart Van Assche wrote:
> > On Mon, 2017-11-06 at 11:44 +0800, Ming Lei wrote:
> > > On Sun, Nov 05, 2017 at 03:38:49PM +0000, Bart Van Assche wrote:
> > > > On Sun, 2017-11-05 at 20:10 +0800, Ming Lei wrote:
> > > > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > > > index 048be4aa6024..0b121f29e3b1 100644
> > > > > --- a/block/blk-core.c
> > > > > +++ b/block/blk-core.c
> > > > > @@ -658,6 +658,10 @@ void blk_cleanup_queue(struct request_queue *q)
> > > > >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> > > > >  	spin_unlock_irq(lock);
> > > > >  
> > > > > +	/* respect queue DEAD via quiesce for blk-mq */
> > > > > +	if (q->mq_ops)
> > > > > +		blk_mq_quiesce_queue(q);
> > > > > +
> > > > >  	/* for synchronous bio-based driver finish in-flight integrity i/o */
> > > > >  	blk_flush_integrity();
> > > > 
> > > > Have you considered to change the blk_freeze_queue_start() call in
> > > > blk_set_queue_dying() into a blk_freeze_queue() call? That approach has the
> > > > advantage that no new if (q->mq_ops) test has to be introduced.
> > > 
> > > That approach isn't nothing to do with this issue, and can't fix this issue
> > > too.
> > 
> > Sorry but I disagree. My opinion is that the change I proposed is a more
> > elegant way to avoid that blk_mq_run_hw_queue() tries to execute a request
> > after blk_cleanup_queue() has started.
> > 
> > > Not mention we hold q->sysfs_lock before calling blk_set_queue_dying(),
> > > there may be risk to cause deadlock.
> > 
> > Sorry but I disagree again. q->sysfs_lock is not obtained from inside any
> > code that runs the queue so there is no deadlock risk. Additionally, I doubt
> > that it is useful to obtain q->sysfs_lock from inside blk_cleanup_queue().
> > The queue attributes that are exported through sysfs can be modified safely
> > until these sysfs attributes are removed.
> > 
> > > The issue is that there isn't any request in queue(queue is frozen), but
> > > dispatch still may happen, let me explain it a bit:
> > > 
> > > 1) there are several IO submit paths in-progress
> > > 2) requests from all these paths are inserted to queue, but may dispatch to
> > > LLD in only one of these paths, but other paths may still move on to dispatch
> > > even all these requests are completed(that means blk_mq_freeze_queue_wait()
> > > returns at that time)
> > > 3) the dispatch after queue dead happens and causes the use-after-free,
> > > because we never respect queue dead for blk-mq.
> > 
> > After a queue has been marked "dying" and after blk_freeze_queue() has
> > returned it is guaranteed that all requests have finished and that all future
> > blk_get_request() calls will fail. Hence .queue_rq() won't be called anymore.
> > Any code that runs a queue (the __blk_mq_run_hw_queue() / blk_mq_make_request()
> > callers) must hold a reference on that queue. Running a queue even after
> 
> The reference is released via blk_queue_exit() when the request is freed in
> blk_mq_free_request(), but the dispatch may still be in-progress after the request
> is freed. I suggest you to take a look at the following words carefully,
> and if you think something is wrong, please comment on it directly:
> 
> 	1) there are several IO submit paths in-progress
> 
> 	2) requests from all these paths are inserted to queue, but may dispatch to
> 	LLD in only one of these paths, but other paths may still move on to dispatch
> 	even all these requests are completed(that means blk_mq_freeze_queue_wait()
> 	returns at that time)
> 
> 	3) the dispatch after queue dead happens and causes the use-after-free,
> 	because we never respect queue dead for blk-mq.
> 
> blk_freeze_queue() can make sure that no .queue_rq() can be run, but
> can't make sure the other dispatch part(like dequeue, check if there is
> work in queue, ...) is finished.

Hi Jens,

Please consider this patch for V4.14, since the in-progress dispatch
after queue DEAD may cause memory corruption(some operations on ctx & lock
may write to the freed memory).

Thanks,
Ming

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue
  2017-11-05 12:10 [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue Ming Lei
  2017-11-05 15:38 ` Bart Van Assche
@ 2017-11-10  6:17 ` Ming Lei
  1 sibling, 0 replies; 9+ messages in thread
From: Ming Lei @ 2017-11-10  6:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Omar Sandoval, Bart Van Assche, Hannes Reinecke, stable

On Sun, Nov 05, 2017 at 08:10:08PM +0800, Ming Lei wrote:
> blk-mq never respects queue dead, and this may cause use-after-free on
> any kind of queue resources. This patch respects the rule by calling
> blk_mq_quiesce_queue() when queue is marked as DEAD.
> 
> This patch fixes the following kernel crash:
> 
> [   42.170824] BUG: unable to handle kernel NULL pointer dereference at           (null)
> [   42.172011] IP: blk_mq_flush_busy_ctxs+0x5a/0xe0
> [   42.172011] PGD 25d8d1067 P4D 25d8d1067 PUD 25d458067 PMD 0
> [   42.172011] Oops: 0000 [#1] PREEMPT SMP
> [   42.172011] Dumping ftrace buffer:
> [   42.172011]    (ftrace buffer empty)
> [   42.172011] Modules linked in: scsi_debug ebtable_filter ebtables ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c bridge stp llc fuse iptable_filter ip_tables sd_mod sg mptsas mptscsih mptbase crc32c_intel scsi_transport_sas nvme lpc_ich serio_raw ahci virtio_scsi libahci libata nvme_core binfmt_misc dm_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi null_blk configs
> [   42.172011] CPU: 0 PID: 2750 Comm: fio Not tainted 4.14.0-rc7.blk_mq_io_hang+ #507
> [   42.172011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.9.3-1.fc25 04/01/2014
> [   42.172011] task: ffff88025dc18000 task.stack: ffffc900028c4000
> [   42.172011] RIP: 0010:blk_mq_flush_busy_ctxs+0x5a/0xe0
> [   42.172011] RSP: 0018:ffffc900028c7be0 EFLAGS: 00010246
> [   42.172011] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8802775907c8
> [   42.172011] RDX: 0000000000000000 RSI: ffffc900028c7c50 RDI: ffff88025de25c00
> [   42.172011] RBP: ffffc900028c7c28 R08: 0000000000000000 R09: ffff8802595692c0
> [   42.172011] R10: ffffc900028c7e50 R11: 0000000000000008 R12: ffffc900028c7c50
> [   42.172011] R13: ffff88025de25cd8 R14: ffffc900028c7d78 R15: ffff88025de25c00
> [   42.172011] FS:  00007faa8653f7c0(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
> [   42.172011] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   42.172011] CR2: 0000000000000000 CR3: 00000002593df006 CR4: 00000000003606f0
> [   42.172011] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   42.172011] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   42.172011] Call Trace:
> [   42.172011]  blk_mq_sched_dispatch_requests+0x1b4/0x1f0
> [   42.172011]  ? preempt_schedule+0x27/0x30
> [   42.172011]  __blk_mq_run_hw_queue+0x8b/0xa0
> [   42.172011]  __blk_mq_delay_run_hw_queue+0xb7/0x100
> [   42.172011]  blk_mq_run_hw_queue+0x14/0x20
> [   42.172011]  blk_mq_sched_insert_requests+0xda/0x120
> [   42.172011]  blk_mq_flush_plug_list+0x179/0x280
> [   42.172011]  blk_flush_plug_list+0x102/0x290
> [   42.172011]  blk_finish_plug+0x2c/0x40
> [   42.172011]  do_io_submit+0x3fa/0x780
> [   42.172011]  SyS_io_submit+0x10/0x20
> [   42.172011]  ? SyS_io_submit+0x10/0x20
> [   42.172011]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [   42.172011] RIP: 0033:0x7faa80ff8697
> [   42.172011] RSP: 002b:00007ffe08236de8 EFLAGS: 00000206 ORIG_RAX: 00000000000000d1
> [   42.172011] RAX: ffffffffffffffda RBX: 0000000001f1e600 RCX: 00007faa80ff8697
> [   42.172011] RDX: 000000000208e638 RSI: 0000000000000001 RDI: 00007faa6ecae000
> [   42.172011] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000001f1e0e0
> [   42.172011] R10: 0000000000000001 R11: 0000000000000206 R12: 00007faa614f19e0
> [   42.172011] R13: 00007faa614ff0b0 R14: 00007faa614fef50 R15: 0000000100000000
> [   42.172011] Code: e0 00 00 00 c7 45 bc 00 00 00 00 85 c0 74 76 4c 8d af d8 00 00 00 49 89 ff 44 8b 45 bc 4c 89 c0 49 c1 e0 06 4d 03 87 e8 00 00 00 <49> 83 38 00 4d 89 c6 74 41 41 8b 8f dc 00 00 00 41 89 c4 31 db
> [   42.172011] RIP: blk_mq_flush_busy_ctxs+0x5a/0xe0 RSP: ffffc900028c7be0
> [   42.172011] CR2: 0000000000000000
> [   42.172011] ---[ end trace 2432dfddf9b84061 ]---
> [   42.172011] Kernel panic - not syncing: Fatal exception
> [   42.172011] Dumping ftrace buffer:
> [   42.172011]    (ftrace buffer empty)
> [   42.172011] Kernel Offset: disabled
> [   42.172011] ---[ end Kernel panic - not syncing: Fatal exception
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-core.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 048be4aa6024..0b121f29e3b1 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -658,6 +658,10 @@ void blk_cleanup_queue(struct request_queue *q)
>  	queue_flag_set(QUEUE_FLAG_DEAD, q);
>  	spin_unlock_irq(lock);
>  
> +	/* respect queue DEAD via quiesce for blk-mq */
> +	if (q->mq_ops)
> +		blk_mq_quiesce_queue(q);
> +
>  	/* for synchronous bio-based driver finish in-flight integrity i/o */
>  	blk_flush_integrity();

Hi Jens,

Could you consider this patch? This issue can be reproduced easily
in heavy IO and device remove test on SCSI.

-- 
Ming

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue
  2017-11-05 15:38 ` Bart Van Assche
  2017-11-06  3:44   ` Ming Lei
@ 2017-11-10 16:30   ` Bart Van Assche
  2017-11-11  2:19     ` Ming Lei
  1 sibling, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2017-11-10 16:30 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei; +Cc: osandov, hare

T24gU3VuLCAyMDE3LTExLTA1IGF0IDE1OjM4ICswMDAwLCBCYXJ0IFZhbiBBc3NjaGUgd3JvdGU6
DQo+IE9uIFN1biwgMjAxNy0xMS0wNSBhdCAyMDoxMCArMDgwMCwgTWluZyBMZWkgd3JvdGU6DQo+
ID4gZGlmZiAtLWdpdCBhL2Jsb2NrL2Jsay1jb3JlLmMgYi9ibG9jay9ibGstY29yZS5jDQo+ID4g
aW5kZXggMDQ4YmU0YWE2MDI0Li4wYjEyMWYyOWUzYjEgMTAwNjQ0DQo+ID4gLS0tIGEvYmxvY2sv
YmxrLWNvcmUuYw0KPiA+ICsrKyBiL2Jsb2NrL2Jsay1jb3JlLmMNCj4gPiBAQCAtNjU4LDYgKzY1
OCwxMCBAQCB2b2lkIGJsa19jbGVhbnVwX3F1ZXVlKHN0cnVjdCByZXF1ZXN0X3F1ZXVlICpxKQ0K
PiA+ICAJcXVldWVfZmxhZ19zZXQoUVVFVUVfRkxBR19ERUFELCBxKTsNCj4gPiAgCXNwaW5fdW5s
b2NrX2lycShsb2NrKTsNCj4gPiAgDQo+ID4gKwkvKiByZXNwZWN0IHF1ZXVlIERFQUQgdmlhIHF1
aWVzY2UgZm9yIGJsay1tcSAqLw0KPiA+ICsJaWYgKHEtPm1xX29wcykNCj4gPiArCQlibGtfbXFf
cXVpZXNjZV9xdWV1ZShxKTsNCj4gPiArDQo+ID4gIAkvKiBmb3Igc3luY2hyb25vdXMgYmlvLWJh
c2VkIGRyaXZlciBmaW5pc2ggaW4tZmxpZ2h0IGludGVncml0eSBpL28gKi8NCj4gPiAgCWJsa19m
bHVzaF9pbnRlZ3JpdHkoKTsNCj4gDQo+IEhhdmUgeW91IGNvbnNpZGVyZWQgdG8gY2hhbmdlIHRo
ZSBibGtfZnJlZXplX3F1ZXVlX3N0YXJ0KCkgY2FsbCBpbg0KPiBibGtfc2V0X3F1ZXVlX2R5aW5n
KCkgaW50byBhIGJsa19mcmVlemVfcXVldWUoKSBjYWxsPyBUaGF0IGFwcHJvYWNoIGhhcyB0aGUN
Cj4gYWR2YW50YWdlIHRoYXQgbm8gbmV3IGlmIChxLT5tcV9vcHMpIHRlc3QgaGFzIHRvIGJlIGlu
dHJvZHVjZWQuDQoNClRoZXJlIGlzIGFuIGFkZGl0aW9uYWwgcmVhc29uIHdoeSBJIHRoaW5rIGl0
IHdvdWxkIGJlIGJldHRlciB0byBtYWtlDQpibGtfc2V0X3F1ZXVlX2R5aW5nKCkgd2FpdCB1bnRp
bCByZXF1ZXN0cyB0aGF0IGFyZSBpbiBwcm9ncmVzcyBoYXZlIGZpbmlzaGVkLg0KU29tZSBibG9j
ayBkcml2ZXJzLCBlLmcuIHRoZSBtdGlwIGRyaXZlciwgc3RhcnQgY2xlYW5pbmcgdXAgcmVzb3Vy
Y2VzDQppbW1lZGlhdGVseSBhZnRlciBibGtfc2V0X3F1ZXVlX2R5aW5nKCkgaGFzIHJldHVybmVk
LiBTbyBtYWtpbmcNCmJsa19zZXRfcXVldWVfZHlpbmcoKSB3YWl0IHdvdWxkIGZpeCBhIHJhY2Ug
Y29uZGl0aW9uIGluIGF0IGxlYXN0IHRoZSBtdGlwDQpkcml2ZXIuDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue
  2017-11-10 16:30   ` Bart Van Assche
@ 2017-11-11  2:19     ` Ming Lei
  0 siblings, 0 replies; 9+ messages in thread
From: Ming Lei @ 2017-11-11  2:19 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe, osandov, hare

On Fri, Nov 10, 2017 at 04:30:35PM +0000, Bart Van Assche wrote:
> On Sun, 2017-11-05 at 15:38 +0000, Bart Van Assche wrote:
> > On Sun, 2017-11-05 at 20:10 +0800, Ming Lei wrote:
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index 048be4aa6024..0b121f29e3b1 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -658,6 +658,10 @@ void blk_cleanup_queue(struct request_queue *q)
> > >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> > >  	spin_unlock_irq(lock);
> > >  
> > > +	/* respect queue DEAD via quiesce for blk-mq */
> > > +	if (q->mq_ops)
> > > +		blk_mq_quiesce_queue(q);
> > > +
> > >  	/* for synchronous bio-based driver finish in-flight integrity i/o */
> > >  	blk_flush_integrity();
> > 
> > Have you considered to change the blk_freeze_queue_start() call in
> > blk_set_queue_dying() into a blk_freeze_queue() call? That approach has the
> > advantage that no new if (q->mq_ops) test has to be introduced.
> 
> There is an additional reason why I think it would be better to make
> blk_set_queue_dying() wait until requests that are in progress have finished.
> Some block drivers, e.g. the mtip driver, start cleaning up resources
> immediately after blk_set_queue_dying() has returned. So making
> blk_set_queue_dying() wait would fix a race condition in at least the mtip
> driver.

Looks you don't explain how your approach fixes this issue.

IMO, your may can't fix this issue.

1) the core idea:
when blk_freeze_queue() returns, there is still in-progress dispatch or
we can call it in-progress io submission; So once blk_freeze_queue() returns,
blk_cleanup_queue() will go ahead, and the in-progress dispatch may trigger
use-after-free, even corrupt memory.

For detailed reason, please see:

	https://marc.info/?l=linux-block&m=150993988115872&w=2

2) suppose you replace blk_freeze_queue_start() with blk_freeze_queue()
in blk_set_queue_dying(), that changes nothing about this issue.

You may think the blk_freeze_queue() called in blk_cleanup_queue() implies
synchronize_rcu(), which may drain up the in-progress dispatch. But the
in-progress dispatch(io submission) may not enter rcu read critical area yet
at that time, for example, in one IO submission path, the request is just
added to scheduler queue, and this request is exactly dispatched to LLD via
other run queue and completed before the actual run queue from the current
io submission. So the implied synchronize_rcu() won't help this case at all.

Or do I miss your idea?

-- 
Ming

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-11-11  2:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-05 12:10 [PATCH] blk-mq: respect queue dead via blk_mq_quiesce_queue Ming Lei
2017-11-05 15:38 ` Bart Van Assche
2017-11-06  3:44   ` Ming Lei
2017-11-06 16:34     ` Bart Van Assche
2017-11-07  2:27       ` Ming Lei
2017-11-08  2:10         ` Ming Lei
2017-11-10 16:30   ` Bart Van Assche
2017-11-11  2:19     ` Ming Lei
2017-11-10  6:17 ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.