All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] blk-mq: fix & improve queue quiescing
@ 2017-05-27 14:21 Ming Lei
  2017-05-27 14:21 ` [PATCH v2 1/8] blk-mq: introduce blk_mq_unquiesce_queue Ming Lei
                   ` (8 more replies)
  0 siblings, 9 replies; 42+ messages in thread
From: Ming Lei @ 2017-05-27 14:21 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

Hi,

There are some issues in current blk_mq_quiesce_queue():

    - in case of direct issue or BLK_MQ_S_START_ON_RUN, dispatch won't
    be prevented after blk_mq_quiesce_queue() is returned.
    - in theory, new RCU read-side critical sections may begin while
    synchronize_rcu() was waiting, and end after returning from 
    synchronize_rcu(), then dispatch still may be run after
    synchronize_rcu() returns

It is observed that request double-free/use-after-free
can be triggered easily when canceling NVMe requests via
blk_mq_tagset_busy_iter(...nvme_cancel_request) in nvme_dev_disable().
The reason is that blk_mq_quiesce_queue() can't prevent
dispatching from being run during the period.

Actually we have to quiesce queue for canceling dispatched
requests via blk_mq_tagset_busy_iter(), otherwise use-after-free
can be made easily. This way of canceling dispatched requests
has been used in several drivers, only NVMe uses blk_mq_quiesce_queue()
to avoid the issue, and others need to be fixed too. And it
should be a common way for handling dead controller.

blk_mq_quiesce_queue() is implemented via stopping queue, which
limits its uses, and easy to casue race, because any queue restart in
other paths may break blk_mq_quiesce_queue(). For example, we sometimes
stops queue when hw can't handle too many ongoing requests and restarts
queue after requests are completed. Meantime when we want to cancel
requests if hardware is dead, quiescing has to be run first, then the
restarting in complete path can break the quiescing. This patch improves
this interface via removing stopping queue, then it can be easier
to use.

V2:
	- split patch "blk-mq: fix blk_mq_quiesce_queue" into two and
  	fix one build issue when only applying the 1st two patches.
	- add kernel oops and hang log into commit log
	- add 'Revert "blk-mq: don't use sync workqueue flushing from drivers"'

 
Ming Lei (8):
  blk-mq: introduce blk_mq_unquiesce_queue
  block: introduce flag of QUEUE_FLAG_QUIESCED
  blk-mq: use the introduced blk_mq_unquiesce_queue()
  blk-mq: fix blk_mq_quiesce_queue
  blk-mq: update comments on blk_mq_quiesce_queue()
  blk-mq: don't stop queue for quiescing
  blk-mq: clarify dispatch may not be drained/blocked by stopping queue
  Revert "blk-mq: don't use sync workqueue flushing from drivers"

 block/blk-mq.c           | 89 +++++++++++++++++++++++++++++++++---------------
 drivers/md/dm-rq.c       |  2 +-
 drivers/nvme/host/core.c |  2 +-
 drivers/scsi/scsi_lib.c  |  5 ++-
 include/linux/blkdev.h   |  3 ++
 5 files changed, 71 insertions(+), 30 deletions(-)

-- 
2.9.4

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/8] blk-mq: introduce blk_mq_unquiesce_queue
  2017-05-27 14:21 [PATCH v2 0/8] blk-mq: fix & improve queue quiescing Ming Lei
@ 2017-05-27 14:21 ` Ming Lei
  2017-05-30 15:09   ` Bart Van Assche
  2017-05-27 14:21 ` [PATCH v2 2/8] block: introduce flag of QUEUE_FLAG_QUIESCED Ming Lei
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2017-05-27 14:21 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

We use blk_mq_start_stopped_hw_queues() implictely
as pair of blk_mq_quiesce_queue() for unquiescing the queue,
so we introduce blk_mq_unquiesce_queue() and make it
as pair of blk_mq_quiesce_queue() explicitely.

This function is for fixing current quiescing mechanism
in the following patches.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c         | 13 +++++++++++++
 include/linux/blkdev.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f2224ffd225d..8ecbbb718946 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -181,6 +181,19 @@ void blk_mq_quiesce_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
 
+/*
+ * blk_mq_unquiesce_queue() - pair of blk_mq_quiesce_queue()
+ * @q: request queue.
+ *
+ * This function recovers queue into the state before quiescing
+ * which is done by blk_mq_quiesce_queue.
+ */
+void blk_mq_unquiesce_queue(struct request_queue *q)
+{
+	blk_mq_start_stopped_hw_queues(q, true);
+}
+EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ab92c4ea138b..41291be82ac4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -964,6 +964,7 @@ extern void __blk_run_queue_uncond(struct request_queue *q);
 extern void blk_run_queue(struct request_queue *);
 extern void blk_run_queue_async(struct request_queue *q);
 extern void blk_mq_quiesce_queue(struct request_queue *q);
+extern void blk_mq_unquiesce_queue(struct request_queue *q);
 extern int blk_rq_map_user(struct request_queue *, struct request *,
 			   struct rq_map_data *, void __user *, unsigned long,
 			   gfp_t);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 2/8] block: introduce flag of QUEUE_FLAG_QUIESCED
  2017-05-27 14:21 [PATCH v2 0/8] blk-mq: fix & improve queue quiescing Ming Lei
  2017-05-27 14:21 ` [PATCH v2 1/8] blk-mq: introduce blk_mq_unquiesce_queue Ming Lei
@ 2017-05-27 14:21 ` Ming Lei
  2017-05-30 15:11   ` Bart Van Assche
  2017-05-27 14:21   ` Ming Lei
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2017-05-27 14:21 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

This flag is introduced for fixing & improving the quiescing code.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/blkdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 41291be82ac4..60967797f4f6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -618,6 +618,7 @@ struct request_queue {
 #define QUEUE_FLAG_STATS       27	/* track rq completion times */
 #define QUEUE_FLAG_POLL_STATS  28	/* collecting stats for hybrid polling */
 #define QUEUE_FLAG_REGISTERED  29	/* queue has been registered to a disk */
+#define QUEUE_FLAG_QUIESCED    30	/* queue has been quiesced */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -712,6 +713,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
 			     REQ_FAILFAST_DRIVER))
+#define blk_queue_quiesced(q)	test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
 
 static inline bool blk_account_rq(struct request *rq)
 {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 3/8] blk-mq: use the introduced blk_mq_unquiesce_queue()
  2017-05-27 14:21 [PATCH v2 0/8] blk-mq: fix & improve queue quiescing Ming Lei
@ 2017-05-27 14:21   ` Ming Lei
  2017-05-27 14:21 ` [PATCH v2 2/8] block: introduce flag of QUEUE_FLAG_QUIESCED Ming Lei
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2017-05-27 14:21 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Ming Lei, linux-nvme, linux-scsi, dm-devel

blk_mq_unquiesce_queue() is used for unquiescing the
queue explicitly, so replace blk_mq_start_stopped_hw_queues()
with it.

Cc: linux-nvme@lists.infradead.org
Cc: linux-scsi@vger.kernel.org
Cc: dm-devel@redhat.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/dm-rq.c       | 2 +-
 drivers/nvme/host/core.c | 2 +-
 drivers/scsi/scsi_lib.c  | 5 ++++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 2af27026aa2e..673fcf075077 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -71,7 +71,7 @@ static void dm_old_start_queue(struct request_queue *q)
 
 static void dm_mq_start_queue(struct request_queue *q)
 {
-	blk_mq_start_stopped_hw_queues(q, true);
+	blk_mq_unquiesce_queue(q);
 	blk_mq_kick_requeue_list(q);
 }
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 04e115834702..231d36028afc 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2514,7 +2514,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		blk_mq_start_stopped_hw_queues(ns->queue, true);
+		blk_mq_unquiesce_queue(ns->queue);
 		blk_mq_kick_requeue_list(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 814a4bd8405d..72b11f75719c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3030,7 +3030,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
 		return -EINVAL;
 
 	if (q->mq_ops) {
-		blk_mq_start_stopped_hw_queues(q, false);
+		if (blk_queue_quiesced(q))
+			blk_mq_unquiesce_queue(q);
+		else
+			blk_mq_start_stopped_hw_queues(q, false);
 	} else {
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_start_queue(q);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 3/8] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-05-27 14:21   ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2017-05-27 14:21 UTC (permalink / raw)


blk_mq_unquiesce_queue() is used for unquiescing the
queue explicitly, so replace blk_mq_start_stopped_hw_queues()
with it.

Cc: linux-nvme at lists.infradead.org
Cc: linux-scsi at vger.kernel.org
Cc: dm-devel at redhat.com
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 drivers/md/dm-rq.c       | 2 +-
 drivers/nvme/host/core.c | 2 +-
 drivers/scsi/scsi_lib.c  | 5 ++++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 2af27026aa2e..673fcf075077 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -71,7 +71,7 @@ static void dm_old_start_queue(struct request_queue *q)
 
 static void dm_mq_start_queue(struct request_queue *q)
 {
-	blk_mq_start_stopped_hw_queues(q, true);
+	blk_mq_unquiesce_queue(q);
 	blk_mq_kick_requeue_list(q);
 }
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 04e115834702..231d36028afc 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2514,7 +2514,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		blk_mq_start_stopped_hw_queues(ns->queue, true);
+		blk_mq_unquiesce_queue(ns->queue);
 		blk_mq_kick_requeue_list(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 814a4bd8405d..72b11f75719c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3030,7 +3030,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
 		return -EINVAL;
 
 	if (q->mq_ops) {
-		blk_mq_start_stopped_hw_queues(q, false);
+		if (blk_queue_quiesced(q))
+			blk_mq_unquiesce_queue(q);
+		else
+			blk_mq_start_stopped_hw_queues(q, false);
 	} else {
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_start_queue(q);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
  2017-05-27 14:21 [PATCH v2 0/8] blk-mq: fix & improve queue quiescing Ming Lei
                   ` (2 preceding siblings ...)
  2017-05-27 14:21   ` Ming Lei
@ 2017-05-27 14:21 ` Ming Lei
  2017-05-27 21:46   ` Bart Van Assche
  2017-05-27 14:21 ` [PATCH v2 5/8] blk-mq: update comments on blk_mq_quiesce_queue() Ming Lei
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2017-05-27 14:21 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

It is required that no dispatch can happen any more once
blk_mq_quiesce_queue() returns, and we don't have such requirement
on APIs of stopping queue.

But blk_mq_quiesce_queue() still may not block/drain dispatch in the
following cases:

- direct issue or BLK_MQ_S_START_ON_RUN
- in theory, new RCU read-side critical sections may begin while
synchronize_rcu() was waiting, and end after synchronize_rcu()
returns, during the period dispatch still may happen

So use the new flag of QUEUE_FLAG_QUIESCED and evaluate it
inside RCU read-side critical sections for fixing the above issues.

This patch fixes request use-after-free[1][2] during canceling requets
of NVMe in nvme_dev_disable(), which can be triggered easily during
NVMe reset & remove test.

[1] oops kernel log when CONFIG_BLK_DEV_INTEGRITY is on
[  103.412969] BUG: unable to handle kernel NULL pointer dereference at 000000000000000a
[  103.412980] IP: bio_integrity_advance+0x48/0xf0
[  103.412981] PGD 275a88067
[  103.412981] P4D 275a88067
[  103.412982] PUD 276c43067
[  103.412983] PMD 0
[  103.412984]
[  103.412986] Oops: 0000 [#1] SMP
[  103.412989] Modules linked in: vfat fat intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd cryptd ipmi_ssif iTCO_wdt iTCO_vendor_support mxm_wmi glue_helper dcdbas ipmi_si mei_me pcspkr mei sg ipmi_devintf lpc_ich ipmi_msghandler shpchp acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel nvme ahci nvme_core libahci libata tg3 i2c_core megaraid_sas ptp pps_core dm_mirror dm_region_hash dm_log dm_mod
[  103.413035] CPU: 0 PID: 102 Comm: kworker/0:2 Not tainted 4.11.0+ #1
[  103.413036] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
[  103.413041] Workqueue: events nvme_remove_dead_ctrl_work [nvme]
[  103.413043] task: ffff9cc8775c8000 task.stack: ffffc033c252c000
[  103.413045] RIP: 0010:bio_integrity_advance+0x48/0xf0
[  103.413046] RSP: 0018:ffffc033c252fc10 EFLAGS: 00010202
[  103.413048] RAX: 0000000000000000 RBX: ffff9cc8720a8cc0 RCX: ffff9cca72958240
[  103.413049] RDX: ffff9cca72958000 RSI: 0000000000000008 RDI: ffff9cc872537f00
[  103.413049] RBP: ffffc033c252fc28 R08: 0000000000000000 R09: ffffffffb963a0d5
[  103.413050] R10: 000000000000063e R11: 0000000000000000 R12: ffff9cc8720a8d18
[  103.413051] R13: 0000000000001000 R14: ffff9cc872682e00 R15: 00000000fffffffb
[  103.413053] FS:  0000000000000000(0000) GS:ffff9cc877c00000(0000) knlGS:0000000000000000
[  103.413054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  103.413055] CR2: 000000000000000a CR3: 0000000276c41000 CR4: 00000000001406f0
[  103.413056] Call Trace:
[  103.413063]  bio_advance+0x2a/0xe0
[  103.413067]  blk_update_request+0x76/0x330
[  103.413072]  blk_mq_end_request+0x1a/0x70
[  103.413074]  blk_mq_dispatch_rq_list+0x370/0x410
[  103.413076]  ? blk_mq_flush_busy_ctxs+0x94/0xe0
[  103.413080]  blk_mq_sched_dispatch_requests+0x173/0x1a0
[  103.413083]  __blk_mq_run_hw_queue+0x8e/0xa0
[  103.413085]  __blk_mq_delay_run_hw_queue+0x9d/0xa0
[  103.413088]  blk_mq_start_hw_queue+0x17/0x20
[  103.413090]  blk_mq_start_hw_queues+0x32/0x50
[  103.413095]  nvme_kill_queues+0x54/0x80 [nvme_core]
[  103.413097]  nvme_remove_dead_ctrl_work+0x1f/0x40 [nvme]
[  103.413103]  process_one_work+0x149/0x360
[  103.413105]  worker_thread+0x4d/0x3c0
[  103.413109]  kthread+0x109/0x140
[  103.413111]  ? rescuer_thread+0x380/0x380
[  103.413113]  ? kthread_park+0x60/0x60
[  103.413120]  ret_from_fork+0x2c/0x40
[  103.413121] Code: 08 4c 8b 63 50 48 8b 80 80 00 00 00 48 8b 90 d0 03 00 00 31 c0 48 83 ba 40 02 00 00 00 48 8d 8a 40 02 00 00 48 0f 45 c1 c1 ee 09 <0f> b6 48 0a 0f b6 40 09 41 89 f5 83 e9 09 41 d3 ed 44 0f af e8
[  103.413145] RIP: bio_integrity_advance+0x48/0xf0 RSP: ffffc033c252fc10
[  103.413146] CR2: 000000000000000a
[  103.413157] ---[ end trace cd6875d16eb5a11e ]---
[  103.455368] Kernel panic - not syncing: Fatal exception
[  103.459826] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  103.850916] ---[ end Kernel panic - not syncing: Fatal exception
[  103.857637] sched: Unexpected reschedule of offline CPU#1!
[  103.863762] ------------[ cut here ]------------

[2] kernel hang in blk_mq_freeze_queue_wait() when CONFIG_BLK_DEV_INTEGRITY is off
[  247.129825] INFO: task nvme-test:1772 blocked for more than 120 seconds.
[  247.137311]       Not tainted 4.12.0-rc2.upstream+ #4
[  247.142954] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.151704] Call Trace:
[  247.154445]  __schedule+0x28a/0x880
[  247.158341]  schedule+0x36/0x80
[  247.161850]  blk_mq_freeze_queue_wait+0x4b/0xb0
[  247.166913]  ? remove_wait_queue+0x60/0x60
[  247.171485]  blk_freeze_queue+0x1a/0x20
[  247.175770]  blk_cleanup_queue+0x7f/0x140
[  247.180252]  nvme_ns_remove+0xa3/0xb0 [nvme_core]
[  247.185503]  nvme_remove_namespaces+0x32/0x50 [nvme_core]
[  247.191532]  nvme_uninit_ctrl+0x2d/0xa0 [nvme_core]
[  247.196977]  nvme_remove+0x70/0x110 [nvme]
[  247.201545]  pci_device_remove+0x39/0xc0
[  247.205927]  device_release_driver_internal+0x141/0x200
[  247.211761]  device_release_driver+0x12/0x20
[  247.216531]  pci_stop_bus_device+0x8c/0xa0
[  247.221104]  pci_stop_and_remove_bus_device_locked+0x1a/0x30
[  247.227420]  remove_store+0x7c/0x90
[  247.231320]  dev_attr_store+0x18/0x30
[  247.235409]  sysfs_kf_write+0x3a/0x50
[  247.239497]  kernfs_fop_write+0xff/0x180
[  247.243867]  __vfs_write+0x37/0x160
[  247.247757]  ? selinux_file_permission+0xe5/0x120
[  247.253011]  ? security_file_permission+0x3b/0xc0
[  247.258260]  vfs_write+0xb2/0x1b0
[  247.261964]  ? syscall_trace_enter+0x1d0/0x2b0
[  247.266924]  SyS_write+0x55/0xc0
[  247.270540]  do_syscall_64+0x67/0x150
[  247.274636]  entry_SYSCALL64_slow_path+0x25/0x25
[  247.279794] RIP: 0033:0x7f5c96740840
[  247.283785] RSP: 002b:00007ffd00e87ee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  247.292238] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5c96740840
[  247.300194] RDX: 0000000000000002 RSI: 00007f5c97060000 RDI: 0000000000000001
[  247.308159] RBP: 00007f5c97060000 R08: 000000000000000a R09: 00007f5c97059740
[  247.316123] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f5c96a14400
[  247.324087] R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000000
[  370.016340] INFO: task nvme-test:1772 blocked for more than 120 seconds.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8ecbbb718946..470ee5514ea9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -170,6 +170,10 @@ void blk_mq_quiesce_queue(struct request_queue *q)
 
 	__blk_mq_stop_hw_queues(q, true);
 
+	spin_lock_irq(q->queue_lock);
+	queue_flag_set(QUEUE_FLAG_QUIESCED, q);
+	spin_unlock_irq(q->queue_lock);
+
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (hctx->flags & BLK_MQ_F_BLOCKING)
 			synchronize_srcu(&hctx->queue_rq_srcu);
@@ -190,6 +194,10 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
  */
 void blk_mq_unquiesce_queue(struct request_queue *q)
 {
+	spin_lock_irq(q->queue_lock);
+	queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
+	spin_unlock_irq(q->queue_lock);
+
 	blk_mq_start_stopped_hw_queues(q, true);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
@@ -209,6 +217,9 @@ void blk_mq_wake_waiters(struct request_queue *q)
 	 * the queue are notified as well.
 	 */
 	wake_up_all(&q->mq_freeze_wq);
+
+	/* Forcibly unquiesce the queue to avoid having stuck requests */
+	blk_mq_unquiesce_queue(q);
 }
 
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
@@ -1108,13 +1119,15 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 
 	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
 		rcu_read_lock();
-		blk_mq_sched_dispatch_requests(hctx);
+		if (!blk_queue_quiesced(hctx->queue))
+			blk_mq_sched_dispatch_requests(hctx);
 		rcu_read_unlock();
 	} else {
 		might_sleep();
 
 		srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
-		blk_mq_sched_dispatch_requests(hctx);
+		if (!blk_queue_quiesced(hctx->queue))
+			blk_mq_sched_dispatch_requests(hctx);
 		srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
 	}
 }
@@ -1519,9 +1532,14 @@ static void __blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie,
 static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 		struct request *rq, blk_qc_t *cookie)
 {
-	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
+	bool blocking = hctx->flags & BLK_MQ_F_BLOCKING;
+	bool quiesced;
+
+	if (!blocking) {
 		rcu_read_lock();
-		__blk_mq_try_issue_directly(rq, cookie, false);
+		quiesced = blk_queue_quiesced(rq->q);
+		if (!quiesced)
+			__blk_mq_try_issue_directly(rq, cookie, false);
 		rcu_read_unlock();
 	} else {
 		unsigned int srcu_idx;
@@ -1529,9 +1547,14 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 		might_sleep();
 
 		srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
-		__blk_mq_try_issue_directly(rq, cookie, true);
+		quiesced = blk_queue_quiesced(rq->q);
+		if (!quiesced)
+			__blk_mq_try_issue_directly(rq, cookie, true);
 		srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
 	}
+
+	if (quiesced)
+		blk_mq_sched_insert_request(rq, false, false, false, blocking);
 }
 
 static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 5/8] blk-mq: update comments on blk_mq_quiesce_queue()
  2017-05-27 14:21 [PATCH v2 0/8] blk-mq: fix & improve queue quiescing Ming Lei
                   ` (3 preceding siblings ...)
  2017-05-27 14:21 ` [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue Ming Lei
@ 2017-05-27 14:21 ` Ming Lei
  2017-05-30 17:14   ` Bart Van Assche
  2017-05-27 14:21 ` [PATCH v2 6/8] blk-mq: don't stop queue for quiescing Ming Lei
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2017-05-27 14:21 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

Actually what we want to get from blk_mq_quiesce_queue()
isn't only to wait for completion of all ongoing .queue_rq().

In the typical context of canceling requests, we need to
make sure that the following is done in the dispatch path
before starting to cancel requests:

	- failed dispatched request is finished
	- busy dispatched request is requeued, and the STARTED
	flag is cleared

So update comment to keep code, doc and our expection consistent.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 470ee5514ea9..032045841856 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -155,12 +155,12 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
 /**
- * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
+ * blk_mq_quiesce_queue() - wait until all ongoing dispatching have finished
  * @q: request queue.
  *
  * Note: this function does not prevent that the struct request end_io()
- * callback function is invoked. Additionally, it is not prevented that
- * new queue_rq() calls occur unless the queue has been stopped first.
+ * callback function is invoked. Once this function is returned, we make
+ * sure no dispatching can happen.
  */
 void blk_mq_quiesce_queue(struct request_queue *q)
 {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 6/8] blk-mq: don't stop queue for quiescing
  2017-05-27 14:21 [PATCH v2 0/8] blk-mq: fix & improve queue quiescing Ming Lei
                   ` (4 preceding siblings ...)
  2017-05-27 14:21 ` [PATCH v2 5/8] blk-mq: update comments on blk_mq_quiesce_queue() Ming Lei
@ 2017-05-27 14:21 ` Ming Lei
  2017-05-27 21:49   ` Bart Van Assche
  2017-05-27 14:21 ` [PATCH v2 7/8] blk-mq: clarify dispatch may not be drained/blocked by stopping queue Ming Lei
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2017-05-27 14:21 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

Queue can be started by other blk-mq APIs and can be used in
different cases, this limits uses of blk_mq_quiesce_queue()
if it is based on stopping queue, and make its usage very
difficult, especially users have to use the stop queue APIs
carefully for avoiding to break blk_mq_quiesce_queue().

We have applied the QUIESCED flag for draining and blocking
dispatch, so it isn't necessary to stop queue any more.

After stopping queue is removed, blk_mq_quiesce_queue() can
be used safely and easily, then users won't worry about queue
restarting during quiescing at all.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 032045841856..84cce67caee3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -168,8 +168,6 @@ void blk_mq_quiesce_queue(struct request_queue *q)
 	unsigned int i;
 	bool rcu = false;
 
-	__blk_mq_stop_hw_queues(q, true);
-
 	spin_lock_irq(q->queue_lock);
 	queue_flag_set(QUEUE_FLAG_QUIESCED, q);
 	spin_unlock_irq(q->queue_lock);
@@ -198,7 +196,12 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
 	queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
 	spin_unlock_irq(q->queue_lock);
 
-	blk_mq_start_stopped_hw_queues(q, true);
+	/*
+	 * During quiescing, requests can be inserted
+	 * to scheduler's queue or sw queue, so we run
+	 * queues for dispatching these requests.
+	 */
+	blk_mq_start_hw_queues(q);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 7/8] blk-mq: clarify dispatch may not be drained/blocked by stopping queue
  2017-05-27 14:21 [PATCH v2 0/8] blk-mq: fix & improve queue quiescing Ming Lei
                   ` (5 preceding siblings ...)
  2017-05-27 14:21 ` [PATCH v2 6/8] blk-mq: don't stop queue for quiescing Ming Lei
@ 2017-05-27 14:21 ` Ming Lei
  2017-05-27 14:21 ` [PATCH v2 8/8] Revert "blk-mq: don't use sync workqueue flushing from drivers" Ming Lei
  2017-05-27 21:32 ` [PATCH v2 0/8] blk-mq: fix & improve queue quiescing Bart Van Assche
  8 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2017-05-27 14:21 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

BLK_MQ_S_STOPPED may not be observed in other concurrent I/O paths,
we can't guarantee that dispatching won't happen after returning
from the APIs of stopping queue.

So clarify the fact and avoid potential misuse.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 84cce67caee3..73e07bc074a6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1240,6 +1240,11 @@ static void __blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx, bool sync)
 	set_bit(BLK_MQ_S_STOPPED, &hctx->state);
 }
 
+/*
+ * We do not guarantee that dispatch can be drained or blocked
+ * after blk_mq_stop_hw_queue() returns. Please use
+ * blk_mq_quiesce_queue() for that requirement.
+ */
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
 	__blk_mq_stop_hw_queue(hctx, false);
@@ -1255,6 +1260,11 @@ static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync)
 		__blk_mq_stop_hw_queue(hctx, sync);
 }
 
+/*
+ * We do not guarantee that dispatch can be drained or blocked
+ * after blk_mq_stop_hw_queues() returns. Please use
+ * blk_mq_quiesce_queue() for that requirement.
+ */
 void blk_mq_stop_hw_queues(struct request_queue *q)
 {
 	__blk_mq_stop_hw_queues(q, false);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 8/8] Revert "blk-mq: don't use sync workqueue flushing from drivers"
  2017-05-27 14:21 [PATCH v2 0/8] blk-mq: fix & improve queue quiescing Ming Lei
                   ` (6 preceding siblings ...)
  2017-05-27 14:21 ` [PATCH v2 7/8] blk-mq: clarify dispatch may not be drained/blocked by stopping queue Ming Lei
@ 2017-05-27 14:21 ` Ming Lei
  2017-05-27 21:32 ` [PATCH v2 0/8] blk-mq: fix & improve queue quiescing Bart Van Assche
  8 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2017-05-27 14:21 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

This patch reverts commit 2719aa217e0d02(blk-mq: don't use
sync workqueue flushing from drivers) because only
blk_mq_quiesce_queue() need the sync flush, and now
we don't need to stop queue any more, so revert it.

Also changes to cancel_delayed_work() in blk_mq_stop_hw_queue().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 30 ++++++++----------------------
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 73e07bc074a6..f38226bd651d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -42,7 +42,6 @@ static LIST_HEAD(all_q_list);
 
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
-static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync);
 
 static int blk_mq_poll_stats_bkt(const struct request *rq)
 {
@@ -1230,16 +1229,6 @@ bool blk_mq_queue_stopped(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_mq_queue_stopped);
 
-static void __blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx, bool sync)
-{
-	if (sync)
-		cancel_delayed_work_sync(&hctx->run_work);
-	else
-		cancel_delayed_work(&hctx->run_work);
-
-	set_bit(BLK_MQ_S_STOPPED, &hctx->state);
-}
-
 /*
  * We do not guarantee that dispatch can be drained or blocked
  * after blk_mq_stop_hw_queue() returns. Please use
@@ -1247,18 +1236,11 @@ static void __blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx, bool sync)
  */
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
-	__blk_mq_stop_hw_queue(hctx, false);
-}
-EXPORT_SYMBOL(blk_mq_stop_hw_queue);
+	cancel_delayed_work(&hctx->run_work);
 
-static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync)
-{
-	struct blk_mq_hw_ctx *hctx;
-	int i;
-
-	queue_for_each_hw_ctx(q, hctx, i)
-		__blk_mq_stop_hw_queue(hctx, sync);
+	set_bit(BLK_MQ_S_STOPPED, &hctx->state);
 }
+EXPORT_SYMBOL(blk_mq_stop_hw_queue);
 
 /*
  * We do not guarantee that dispatch can be drained or blocked
@@ -1267,7 +1249,11 @@ static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync)
  */
 void blk_mq_stop_hw_queues(struct request_queue *q)
 {
-	__blk_mq_stop_hw_queues(q, false);
+	struct blk_mq_hw_ctx *hctx;
+	int i;
+
+	queue_for_each_hw_ctx(q, hctx, i)
+		blk_mq_stop_hw_queue(hctx);
 }
 EXPORT_SYMBOL(blk_mq_stop_hw_queues);
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/8] blk-mq: fix & improve queue quiescing
  2017-05-27 14:21 [PATCH v2 0/8] blk-mq: fix & improve queue quiescing Ming Lei
                   ` (7 preceding siblings ...)
  2017-05-27 14:21 ` [PATCH v2 8/8] Revert "blk-mq: don't use sync workqueue flushing from drivers" Ming Lei
@ 2017-05-27 21:32 ` Bart Van Assche
  2017-05-28 11:11   ` Ming Lei
  8 siblings, 1 reply; 42+ messages in thread
From: Bart Van Assche @ 2017-05-27 21:32 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei

T24gU2F0LCAyMDE3LTA1LTI3IGF0IDIyOjIxICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gVGhl
cmUgYXJlIHNvbWUgaXNzdWVzIGluIGN1cnJlbnQgYmxrX21xX3F1aWVzY2VfcXVldWUoKToNCj4g
DQo+ICAgICAtIGluIGNhc2Ugb2YgZGlyZWN0IGlzc3VlIG9yIEJMS19NUV9TX1NUQVJUX09OX1JV
TiwgZGlzcGF0Y2ggd29uJ3QNCj4gICAgIGJlIHByZXZlbnRlZCBhZnRlciBibGtfbXFfcXVpZXNj
ZV9xdWV1ZSgpIGlzIHJldHVybmVkLg0KPiAgICAgLSBpbiB0aGVvcnksIG5ldyBSQ1UgcmVhZC1z
aWRlIGNyaXRpY2FsIHNlY3Rpb25zIG1heSBiZWdpbiB3aGlsZQ0KPiAgICAgc3luY2hyb25pemVf
cmN1KCkgd2FzIHdhaXRpbmcsIGFuZCBlbmQgYWZ0ZXIgcmV0dXJuaW5nIGZyb20gDQo+ICAgICBz
eW5jaHJvbml6ZV9yY3UoKSwgdGhlbiBkaXNwYXRjaCBzdGlsbCBtYXkgYmUgcnVuIGFmdGVyDQo+
ICAgICBzeW5jaHJvbml6ZV9yY3UoKSByZXR1cm5zDQoNCkhlbGxvIE1pbmcsDQoNCkkgZGlzYWdy
ZWUgd2l0aCB0aGUgc2Vjb25kIHBhcnQgb2YgeW91ciBzdGF0ZW1lbnQuIGJsa19tcV9xdWllc2Nl
X3F1ZXVlKCkNCnN0b3BzIGFsbCBoYXJkd2FyZSBxdWV1ZXMgYmVmb3JlIGNhbGxpbmcgc3luY2hy
b25pemVfKnJjdSgpLg0KYmxrX21xX3NjaGVkX2Rpc3BhdGNoX3JlcXVlc3RzKCkgaXMgY2FsbGVk
IHdpdGggYW4gUkNVIGxvY2sgaGVsZCBhbmQgY2hlY2tzDQp3aGV0aGVyIGEgaGN0eCBoYXMgYmVl
biBzdG9wcGVkIGJlZm9yZSBjYWxsaW5nIC5xdWV1ZV9ycSgpLiBUaGF0IGlzDQpzdWZmaWNpZW50
IHRvIHByZXZlbnQgYmxrX21xX3RyeV9pc3N1ZV9kaXJlY3RseSgpIHRvIHRyaWdnZXIgYSAucXVl
dWVfcnEoKQ0KY2FsbC4NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
  2017-05-27 14:21 ` [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue Ming Lei
@ 2017-05-27 21:46   ` Bart Van Assche
  2017-05-28 10:44     ` Ming Lei
  0 siblings, 1 reply; 42+ messages in thread
From: Bart Van Assche @ 2017-05-27 21:46 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei

T24gU2F0LCAyMDE3LTA1LTI3IGF0IDIyOjIxICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gSXQg
aXMgcmVxdWlyZWQgdGhhdCBubyBkaXNwYXRjaCBjYW4gaGFwcGVuIGFueSBtb3JlIG9uY2UNCj4g
YmxrX21xX3F1aWVzY2VfcXVldWUoKSByZXR1cm5zLCBhbmQgd2UgZG9uJ3QgaGF2ZSBzdWNoIHJl
cXVpcmVtZW50DQo+IG9uIEFQSXMgb2Ygc3RvcHBpbmcgcXVldWUuDQo+IA0KPiBCdXQgYmxrX21x
X3F1aWVzY2VfcXVldWUoKSBzdGlsbCBtYXkgbm90IGJsb2NrL2RyYWluIGRpc3BhdGNoIGluIHRo
ZQ0KPiBmb2xsb3dpbmcgY2FzZXM6DQo+IA0KPiAtIGRpcmVjdCBpc3N1ZSBvciBCTEtfTVFfU19T
VEFSVF9PTl9SVU4NCj4gLSBpbiB0aGVvcnksIG5ldyBSQ1UgcmVhZC1zaWRlIGNyaXRpY2FsIHNl
Y3Rpb25zIG1heSBiZWdpbiB3aGlsZQ0KPiBzeW5jaHJvbml6ZV9yY3UoKSB3YXMgd2FpdGluZywg
YW5kIGVuZCBhZnRlciBzeW5jaHJvbml6ZV9yY3UoKQ0KPiByZXR1cm5zLCBkdXJpbmcgdGhlIHBl
cmlvZCBkaXNwYXRjaCBzdGlsbCBtYXkgaGFwcGVuDQoNCkhlbGxvIE1pbmcsDQoNCkkgdGhpbmsg
dGhlIHRpdGxlIGFuZCB0aGUgZGVzY3JpcHRpb24gb2YgdGhpcyBwYXRjaCBhcmUgd3JvbmcuIFNp
bmNlDQp0aGUgY3VycmVudCBxdWV1ZSBxdWllc2NpbmcgbWVjaGFuaXNtIHdvcmtzIGZpbmUgZm9y
IGRyaXZlcnMgdGhhdCBkbw0Kbm90IHN0b3AgYW5kIHJlc3RhcnQgYSBxdWV1ZSAoZS5nLiBTQ1NJ
IGFuZCBkbS1jb3JlKSwgcGxlYXNlIGNoYW5nZSB0aGUNCnRpdGxlIGFuZCBkZXNjcmlwdGlvbiB0
byByZWZsZWN0IHRoYXQgdGhlIHB1cnBvc2Ugb2YgdGhpcyBwYXRjaCBpcw0KdG8gYWxsb3cgZHJp
dmVycyB0aGF0IHVzZSB0aGUgcXVpZXNjZSBtZWNoYW5pc20gdG8gcmVzdGFydCBhIHF1ZXVlDQp3
aXRob3V0IHVucXVpZXNjaW5nIGl0Lg0KDQo+IEBAIC0yMDksNiArMjE3LDkgQEAgdm9pZCBibGtf
bXFfd2FrZV93YWl0ZXJzKHN0cnVjdCByZXF1ZXN0X3F1ZXVlICpxKQ0KPiAgCSAqIHRoZSBxdWV1
ZSBhcmUgbm90aWZpZWQgYXMgd2VsbC4NCj4gIAkgKi8NCj4gIAl3YWtlX3VwX2FsbCgmcS0+bXFf
ZnJlZXplX3dxKTsNCj4gKw0KPiArCS8qIEZvcmNpYmx5IHVucXVpZXNjZSB0aGUgcXVldWUgdG8g
YXZvaWQgaGF2aW5nIHN0dWNrIHJlcXVlc3RzICovDQo+ICsJYmxrX21xX3VucXVpZXNjZV9xdWV1
ZShxKTsNCj4gIH0NCg0KU2hvdWxkIHRoZSBibG9jayBsYXllciB1bnF1aWVzY2UgYSBxdWV1ZSBp
ZiBhIGJsb2NrIGRyaXZlciBoYXNuJ3QgDQpkb25lIHRoYXQgYmVmb3JlIHF1ZXVlIHJlbW92YWwg
c3RhcnRzIG9yIHNob3VsZCB0aGUgYmxvY2sgZHJpdmVyDQppdHNlbGYgZG8gdGhhdD8gVGhlIGJs
b2NrIGxheWVyIGRvZXNuJ3QgcmVzdGFydCBzdG9wcGVkIHF1ZXVlcyBmcm9tDQppbnNpZGUgYmxr
X3NldF9xdWV1ZV9keWluZygpIHNvIHdoeSBzaG91bGQgaXQgdW5xdWllc2NlIGEgcXVpZXNjZWQN
CnF1ZXVlPw0KIA0KPiAgYm9vbCBibGtfbXFfY2FuX3F1ZXVlKHN0cnVjdCBibGtfbXFfaHdfY3R4
ICpoY3R4KQ0KPiBAQCAtMTEwOCwxMyArMTExOSwxNSBAQCBzdGF0aWMgdm9pZCBfX2Jsa19tcV9y
dW5faHdfcXVldWUoc3RydWN0IGJsa19tcV9od19jdHggKmhjdHgpDQo+ICANCj4gIAlpZiAoISho
Y3R4LT5mbGFncyAmIEJMS19NUV9GX0JMT0NLSU5HKSkgew0KPiAgCQlyY3VfcmVhZF9sb2NrKCk7
DQo+IC0JCWJsa19tcV9zY2hlZF9kaXNwYXRjaF9yZXF1ZXN0cyhoY3R4KTsNCj4gKwkJaWYgKCFi
bGtfcXVldWVfcXVpZXNjZWQoaGN0eC0+cXVldWUpKQ0KPiArCQkJYmxrX21xX3NjaGVkX2Rpc3Bh
dGNoX3JlcXVlc3RzKGhjdHgpOw0KPiAgCQlyY3VfcmVhZF91bmxvY2soKTsNCj4gIAl9IGVsc2Ug
ew0KPiAgCQltaWdodF9zbGVlcCgpOw0KPiAgDQo+ICAJCXNyY3VfaWR4ID0gc3JjdV9yZWFkX2xv
Y2soJmhjdHgtPnF1ZXVlX3JxX3NyY3UpOw0KPiAtCQlibGtfbXFfc2NoZWRfZGlzcGF0Y2hfcmVx
dWVzdHMoaGN0eCk7DQo+ICsJCWlmICghYmxrX3F1ZXVlX3F1aWVzY2VkKGhjdHgtPnF1ZXVlKSkN
Cj4gKwkJCWJsa19tcV9zY2hlZF9kaXNwYXRjaF9yZXF1ZXN0cyhoY3R4KTsNCj4gIAkJc3JjdV9y
ZWFkX3VubG9jaygmaGN0eC0+cXVldWVfcnFfc3JjdSwgc3JjdV9pZHgpOw0KPiAgCX0NCj4gIH0N
Cg0KU29ycnkgYnV0IEkgZG9uJ3QgbGlrZSB0aGVzZSBjaGFuZ2VzLiBXaHkgaGF2ZSB0aGUgYmxr
X3F1ZXVlX3F1aWVzY2VkKCkNCmNhbGxzIGJlIGFkZGVkIGF0IG90aGVyIGNvZGUgbG9jYXRpb25z
IHRoYW4gdGhlIGJsa19tcV9oY3R4X3N0b3BwZWQoKSBjYWxscz8NClRoaXMgd2lsbCBtYWtlIHRo
ZSBibG9jayBsYXllciB1bm5lY2Vzc2FyeSBoYXJkIHRvIG1haW50YWluLiBQbGVhc2UgY29uc2lk
ZXINCnRvIGNoYW5nZSB0aGUgYmxrX21xX2hjdHhfc3RvcHBlZChoY3R4KSBjYWxscyBpbiBibGtf
bXFfc2NoZWRfZGlzcGF0Y2hfcmVxdWVzdHMoKQ0KYW5kICpibGtfbXFfKnJ1bl9od19xdWV1ZSoo
KSBpbnRvIGJsa19tcV9oY3R4X3N0b3BwZWQoaGN0eCkgfHwgYmxrX3F1ZXVlX3F1aWVzY2VkKHEp
Lg0KDQpUaGFua3MsDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 6/8] blk-mq: don't stop queue for quiescing
  2017-05-27 14:21 ` [PATCH v2 6/8] blk-mq: don't stop queue for quiescing Ming Lei
@ 2017-05-27 21:49   ` Bart Van Assche
  2017-05-28 10:50     ` Ming Lei
  0 siblings, 1 reply; 42+ messages in thread
From: Bart Van Assche @ 2017-05-27 21:49 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei

T24gU2F0LCAyMDE3LTA1LTI3IGF0IDIyOjIxICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gZGlm
ZiAtLWdpdCBhL2Jsb2NrL2Jsay1tcS5jIGIvYmxvY2svYmxrLW1xLmMNCj4gaW5kZXggMDMyMDQ1
ODQxODU2Li44NGNjZTY3Y2FlZTMgMTAwNjQ0DQo+IC0tLSBhL2Jsb2NrL2Jsay1tcS5jDQo+ICsr
KyBiL2Jsb2NrL2Jsay1tcS5jDQo+IEBAIC0xNjgsOCArMTY4LDYgQEAgdm9pZCBibGtfbXFfcXVp
ZXNjZV9xdWV1ZShzdHJ1Y3QgcmVxdWVzdF9xdWV1ZSAqcSkNCj4gIAl1bnNpZ25lZCBpbnQgaTsN
Cj4gIAlib29sIHJjdSA9IGZhbHNlOw0KPiAgDQo+IC0JX19ibGtfbXFfc3RvcF9od19xdWV1ZXMo
cSwgdHJ1ZSk7DQo+IC0NCj4gIAlzcGluX2xvY2tfaXJxKHEtPnF1ZXVlX2xvY2spOw0KPiAgCXF1
ZXVlX2ZsYWdfc2V0KFFVRVVFX0ZMQUdfUVVJRVNDRUQsIHEpOw0KPiAgCXNwaW5fdW5sb2NrX2ly
cShxLT5xdWV1ZV9sb2NrKTsNCj4gQEAgLTE5OCw3ICsxOTYsMTIgQEAgdm9pZCBibGtfbXFfdW5x
dWllc2NlX3F1ZXVlKHN0cnVjdCByZXF1ZXN0X3F1ZXVlICpxKQ0KPiAgCXF1ZXVlX2ZsYWdfY2xl
YXIoUVVFVUVfRkxBR19RVUlFU0NFRCwgcSk7DQo+ICAJc3Bpbl91bmxvY2tfaXJxKHEtPnF1ZXVl
X2xvY2spOw0KPiAgDQo+IC0JYmxrX21xX3N0YXJ0X3N0b3BwZWRfaHdfcXVldWVzKHEsIHRydWUp
Ow0KPiArCS8qDQo+ICsJICogRHVyaW5nIHF1aWVzY2luZywgcmVxdWVzdHMgY2FuIGJlIGluc2Vy
dGVkDQo+ICsJICogdG8gc2NoZWR1bGVyJ3MgcXVldWUgb3Igc3cgcXVldWUsIHNvIHdlIHJ1bg0K
PiArCSAqIHF1ZXVlcyBmb3IgZGlzcGF0Y2hpbmcgdGhlc2UgcmVxdWVzdHMuDQo+ICsJICovDQo+
ICsJYmxrX21xX3N0YXJ0X2h3X3F1ZXVlcyhxKTsNCj4gIH0NCj4gIEVYUE9SVF9TWU1CT0xfR1BM
KGJsa19tcV91bnF1aWVzY2VfcXVldWUpOw0KDQpIZWxsbyBNaW5nLA0KDQpUaGlzIGxvb2tzIHJl
YWxseSB3ZWlyZCB0byBtZS4gSWYgYmxrX21xX3F1aWVzY2VfcXVldWUoKSBkb2Vzbid0IHN0b3Ag
YW55DQpoYXJkd2FyZSBxdWV1ZXMsIHdoeSBzaG91bGQgYmxrX21xX3VucXVpZXNjZV9xdWV1ZSgp
IHN0YXJ0IGFueSBoYXJkd2FyZQ0KcXVldWVzPw0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
  2017-05-27 21:46   ` Bart Van Assche
@ 2017-05-28 10:44     ` Ming Lei
  2017-05-28 16:10       ` Bart Van Assche
  2017-05-30 19:23       ` Bart Van Assche
  0 siblings, 2 replies; 42+ messages in thread
From: Ming Lei @ 2017-05-28 10:44 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe


On Sat, May 27, 2017 at 09:46:45PM +0000, Bart Van Assche wrote:
> On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> > It is required that no dispatch can happen any more once
> > blk_mq_quiesce_queue() returns, and we don't have such requirement
> > on APIs of stopping queue.
> > 
> > But blk_mq_quiesce_queue() still may not block/drain dispatch in the
> > following cases:
> > 
> > - direct issue or BLK_MQ_S_START_ON_RUN
> > - in theory, new RCU read-side critical sections may begin while
> > synchronize_rcu() was waiting, and end after synchronize_rcu()
> > returns, during the period dispatch still may happen
> 
> Hello Ming,

Hello Bart,

> 
> I think the title and the description of this patch are wrong. Since
> the current queue quiescing mechanism works fine for drivers that do
> not stop and restart a queue (e.g. SCSI and dm-core), please change the

I have provided the issues in current quiesce mechanism, now I post it again:

	But blk_mq_quiesce_queue() still may not block/drain dispatch in the
	following cases:
	
	- direct issue or BLK_MQ_S_START_ON_RUN
	- in theory, new RCU read-side critical sections may begin while
	synchronize_rcu() was waiting, and end after synchronize_rcu()
	returns, during the period dispatch still may happen

Not like stopping queue, any dispatching has to be drained/blocked
when the synchronize_rcu() returns, otherwise double free or
use-after-free can be triggered, which has been observed on NVMe
already.

> title and description to reflect that the purpose of this patch is
> to allow drivers that use the quiesce mechanism to restart a queue
> without unquiescing it.

First it is really a fix, and then a improvement, so could you tell me
where is wrong with the title and the description?

> 
> > @@ -209,6 +217,9 @@ void blk_mq_wake_waiters(struct request_queue *q)
> >  	 * the queue are notified as well.
> >  	 */
> >  	wake_up_all(&q->mq_freeze_wq);
> > +
> > +	/* Forcibly unquiesce the queue to avoid having stuck requests */
> > +	blk_mq_unquiesce_queue(q);
> >  }
> 
> Should the block layer unquiesce a queue if a block driver hasn't 
> done that before queue removal starts or should the block driver
> itself do that?

Some drivers might quiesce a queue and not unquiesce it, such as
NVMe.

OK, I will consider to fix drivers first.

> The block layer doesn't restart stopped queues from
> inside blk_set_queue_dying() so why should it unquiesce a quiesced
> queue?

If the quiesced queue isn't unquiesced, it may cause I/O hang, since
any I/O in sw queue/scheduler queue can't be completed at all.

OK, will fix driver in next post.

Actually the queue has to be started after blk_set_queue_dying(),
otherwise it can cause I/O hang too, and there can be lots of
writeback I/O in the following del_gendisk(). We have done it
in NVMe already, see nvme_kill_queues().

Maybe in future, we should consider to do that all in block layer.

>  
> >  bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
> > @@ -1108,13 +1119,15 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> >  
> >  	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
> >  		rcu_read_lock();
> > -		blk_mq_sched_dispatch_requests(hctx);
> > +		if (!blk_queue_quiesced(hctx->queue))
> > +			blk_mq_sched_dispatch_requests(hctx);
> >  		rcu_read_unlock();
> >  	} else {
> >  		might_sleep();
> >  
> >  		srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
> > -		blk_mq_sched_dispatch_requests(hctx);
> > +		if (!blk_queue_quiesced(hctx->queue))
> > +			blk_mq_sched_dispatch_requests(hctx);
> >  		srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
> >  	}
> >  }
> 
> Sorry but I don't like these changes. Why have the blk_queue_quiesced()
> calls be added at other code locations than the blk_mq_hctx_stopped() calls?
> This will make the block layer unnecessary hard to maintain. Please consider
> to change the blk_mq_hctx_stopped(hctx) calls in blk_mq_sched_dispatch_requests()
> and *blk_mq_*run_hw_queue*() into blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q).

One benefit is that we make it explicit that the flag has to be checked
inside the RCU read-side critical sections. If you put it somewhere,
someone may put it out of read-side critical sections in future.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 6/8] blk-mq: don't stop queue for quiescing
  2017-05-27 21:49   ` Bart Van Assche
@ 2017-05-28 10:50     ` Ming Lei
  2017-05-28 16:03       ` Bart Van Assche
  0 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2017-05-28 10:50 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe

On Sat, May 27, 2017 at 09:49:27PM +0000, Bart Van Assche wrote:
> On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 032045841856..84cce67caee3 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -168,8 +168,6 @@ void blk_mq_quiesce_queue(struct request_queue *q)
> >  	unsigned int i;
> >  	bool rcu = false;
> >  
> > -	__blk_mq_stop_hw_queues(q, true);
> > -
> >  	spin_lock_irq(q->queue_lock);
> >  	queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> >  	spin_unlock_irq(q->queue_lock);
> > @@ -198,7 +196,12 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
> >  	queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> >  	spin_unlock_irq(q->queue_lock);
> >  
> > -	blk_mq_start_stopped_hw_queues(q, true);
> > +	/*
> > +	 * During quiescing, requests can be inserted
> > +	 * to scheduler's queue or sw queue, so we run
> > +	 * queues for dispatching these requests.
> > +	 */
> > +	blk_mq_start_hw_queues(q);
> >  }
> >  EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
> 
> Hello Ming,

Hello Bart,

> 
> This looks really weird to me. If blk_mq_quiesce_queue() doesn't stop any
> hardware queues, why should blk_mq_unquiesce_queue() start any hardware
> queues?

Please see the above comment, especially in direct issue path, we have to
insert the request into sw/scheduler queue when queue is quiesced,
and these queued requests have to be dispatched out during unquiescing.

I considered to sleep the current context under this situation, but
turns out it is more complicated to handle, given we have very limited
queue depth, just let applications consumes all, then wait.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/8] blk-mq: fix & improve queue quiescing
  2017-05-27 21:32 ` [PATCH v2 0/8] blk-mq: fix & improve queue quiescing Bart Van Assche
@ 2017-05-28 11:11   ` Ming Lei
  2017-05-28 16:01     ` Bart Van Assche
  0 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2017-05-28 11:11 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe

On Sat, May 27, 2017 at 09:32:02PM +0000, Bart Van Assche wrote:
> On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> > There are some issues in current blk_mq_quiesce_queue():
> > 
> >     - in case of direct issue or BLK_MQ_S_START_ON_RUN, dispatch won't
> >     be prevented after blk_mq_quiesce_queue() is returned.
> >     - in theory, new RCU read-side critical sections may begin while
> >     synchronize_rcu() was waiting, and end after returning from 
> >     synchronize_rcu(), then dispatch still may be run after
> >     synchronize_rcu() returns
> 
> Hello Ming,

Hello Bart,

> 
> I disagree with the second part of your statement. blk_mq_quiesce_queue()

You can find the similar description from line 158 to line 180.
of Documentation/RCU/whatisRCU.txt.

For example, there may be schedule preempt or irq handling between
checking stopped flag(false) and the RCU read-side critical sections, and
synchronize_*rcu() may return before running this RCU read-side critical
sections. That is why we have to move the check into RCU read-side
critical sections.

> stops all hardware queues before calling synchronize_*rcu().
> blk_mq_sched_dispatch_requests() is called with an RCU lock held and checks
> whether a hctx has been stopped before calling .queue_rq(). That is
> sufficient to prevent blk_mq_try_issue_directly() to trigger a .queue_rq()
> call.

Our case should be very similar with example "6.  ANALOGY WITH READER-WRITER
LOCKING" in Documentation/RCU/whatisRCU.txt.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/8] blk-mq: fix & improve queue quiescing
  2017-05-28 11:11   ` Ming Lei
@ 2017-05-28 16:01     ` Bart Van Assche
  2017-05-30  0:34       ` Ming Lei
  0 siblings, 1 reply; 42+ messages in thread
From: Bart Van Assche @ 2017-05-28 16:01 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, axboe

T24gU3VuLCAyMDE3LTA1LTI4IGF0IDE5OjExICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
U2F0LCBNYXkgMjcsIDIwMTcgYXQgMDk6MzI6MDJQTSArMDAwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+IE9uIFNhdCwgMjAxNy0wNS0yNyBhdCAyMjoyMSArMDgwMCwgTWluZyBMZWkgd3Jv
dGU6DQo+ID4gPiBUaGVyZSBhcmUgc29tZSBpc3N1ZXMgaW4gY3VycmVudCBibGtfbXFfcXVpZXNj
ZV9xdWV1ZSgpOg0KPiA+ID4gDQo+ID4gPiAgICAgLSBpbiBjYXNlIG9mIGRpcmVjdCBpc3N1ZSBv
ciBCTEtfTVFfU19TVEFSVF9PTl9SVU4sIGRpc3BhdGNoIHdvbid0DQo+ID4gPiAgICAgYmUgcHJl
dmVudGVkIGFmdGVyIGJsa19tcV9xdWllc2NlX3F1ZXVlKCkgaXMgcmV0dXJuZWQuDQo+ID4gPiAg
ICAgLSBpbiB0aGVvcnksIG5ldyBSQ1UgcmVhZC1zaWRlIGNyaXRpY2FsIHNlY3Rpb25zIG1heSBi
ZWdpbiB3aGlsZQ0KPiA+ID4gICAgIHN5bmNocm9uaXplX3JjdSgpIHdhcyB3YWl0aW5nLCBhbmQg
ZW5kIGFmdGVyIHJldHVybmluZyBmcm9tIA0KPiA+ID4gICAgIHN5bmNocm9uaXplX3JjdSgpLCB0
aGVuIGRpc3BhdGNoIHN0aWxsIG1heSBiZSBydW4gYWZ0ZXINCj4gPiA+ICAgICBzeW5jaHJvbml6
ZV9yY3UoKSByZXR1cm5zDQo+ID4gDQo+ID4gSSBkaXNhZ3JlZSB3aXRoIHRoZSBzZWNvbmQgcGFy
dCBvZiB5b3VyIHN0YXRlbWVudC4gYmxrX21xX3F1aWVzY2VfcXVldWUoKQ0KPiANCj4gWW91IGNh
biBmaW5kIHRoZSBzaW1pbGFyIGRlc2NyaXB0aW9uIGZyb20gbGluZSAxNTggdG8gbGluZSAxODAu
DQo+IG9mIERvY3VtZW50YXRpb24vUkNVL3doYXRpc1JDVS50eHQuDQo+IA0KPiBGb3IgZXhhbXBs
ZSwgdGhlcmUgbWF5IGJlIHNjaGVkdWxlIHByZWVtcHQgb3IgaXJxIGhhbmRsaW5nIGJldHdlZW4N
Cj4gY2hlY2tpbmcgc3RvcHBlZCBmbGFnKGZhbHNlKSBhbmQgdGhlIFJDVSByZWFkLXNpZGUgY3Jp
dGljYWwgc2VjdGlvbnMsIGFuZA0KPiBzeW5jaHJvbml6ZV8qcmN1KCkgbWF5IHJldHVybiBiZWZv
cmUgcnVubmluZyB0aGlzIFJDVSByZWFkLXNpZGUgY3JpdGljYWwNCj4gc2VjdGlvbnMuIFRoYXQg
aXMgd2h5IHdlIGhhdmUgdG8gbW92ZSB0aGUgY2hlY2sgaW50byBSQ1UgcmVhZC1zaWRlDQo+IGNy
aXRpY2FsIHNlY3Rpb25zLg0KDQpIZWxsbyBNaW5nLA0KDQpIYXZlIHlvdSBub3RpY2VkIHRoYXQg
aW4gdGhlIGN1cnJlbnQgYmxrLW1xIGNvZGUgdGhlIHRlc3Qgb2YgdGhlIGhjdHgNCnN0b3BwZWQg
ZmxhZyBhbHJlYWR5IG9jY3VycyAqaW5zaWRlKiBhbiBSQ1UgcmVhZC1zaWRlIGNyaXRpY2FsIHNl
Y3Rpb24/DQpfX2Jsa19tcV9ydW5faHdfcXVldWUoKSBncmFicyBhbiBSQ1UgcmVhZCBsb2NrIGJl
Zm9yZQ0KYmxrX21xX3NjaGVkX2Rpc3BhdGNoX3JlcXVlc3RzKCkgY2FsbHMgYmxrX21xX2hjdHhf
c3RvcHBlZCgpLg0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 6/8] blk-mq: don't stop queue for quiescing
  2017-05-28 10:50     ` Ming Lei
@ 2017-05-28 16:03       ` Bart Van Assche
  2017-05-30  0:27         ` Ming Lei
  0 siblings, 1 reply; 42+ messages in thread
From: Bart Van Assche @ 2017-05-28 16:03 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, axboe

T24gU3VuLCAyMDE3LTA1LTI4IGF0IDE4OjUwICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
U2F0LCBNYXkgMjcsIDIwMTcgYXQgMDk6NDk6MjdQTSArMDAwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+IE9uIFNhdCwgMjAxNy0wNS0yNyBhdCAyMjoyMSArMDgwMCwgTWluZyBMZWkgd3Jv
dGU6DQo+ID4gPiBkaWZmIC0tZ2l0IGEvYmxvY2svYmxrLW1xLmMgYi9ibG9jay9ibGstbXEuYw0K
PiA+ID4gaW5kZXggMDMyMDQ1ODQxODU2Li44NGNjZTY3Y2FlZTMgMTAwNjQ0DQo+ID4gPiAtLS0g
YS9ibG9jay9ibGstbXEuYw0KPiA+ID4gKysrIGIvYmxvY2svYmxrLW1xLmMNCj4gPiA+IEBAIC0x
NjgsOCArMTY4LDYgQEAgdm9pZCBibGtfbXFfcXVpZXNjZV9xdWV1ZShzdHJ1Y3QgcmVxdWVzdF9x
dWV1ZSAqcSkNCj4gPiA+ICAJdW5zaWduZWQgaW50IGk7DQo+ID4gPiAgCWJvb2wgcmN1ID0gZmFs
c2U7DQo+ID4gPiAgDQo+ID4gPiAtCV9fYmxrX21xX3N0b3BfaHdfcXVldWVzKHEsIHRydWUpOw0K
PiA+ID4gLQ0KPiA+ID4gIAlzcGluX2xvY2tfaXJxKHEtPnF1ZXVlX2xvY2spOw0KPiA+ID4gIAlx
dWV1ZV9mbGFnX3NldChRVUVVRV9GTEFHX1FVSUVTQ0VELCBxKTsNCj4gPiA+ICAJc3Bpbl91bmxv
Y2tfaXJxKHEtPnF1ZXVlX2xvY2spOw0KPiA+ID4gQEAgLTE5OCw3ICsxOTYsMTIgQEAgdm9pZCBi
bGtfbXFfdW5xdWllc2NlX3F1ZXVlKHN0cnVjdCByZXF1ZXN0X3F1ZXVlICpxKQ0KPiA+ID4gIAlx
dWV1ZV9mbGFnX2NsZWFyKFFVRVVFX0ZMQUdfUVVJRVNDRUQsIHEpOw0KPiA+ID4gIAlzcGluX3Vu
bG9ja19pcnEocS0+cXVldWVfbG9jayk7DQo+ID4gPiAgDQo+ID4gPiAtCWJsa19tcV9zdGFydF9z
dG9wcGVkX2h3X3F1ZXVlcyhxLCB0cnVlKTsNCj4gPiA+ICsJLyoNCj4gPiA+ICsJICogRHVyaW5n
IHF1aWVzY2luZywgcmVxdWVzdHMgY2FuIGJlIGluc2VydGVkDQo+ID4gPiArCSAqIHRvIHNjaGVk
dWxlcidzIHF1ZXVlIG9yIHN3IHF1ZXVlLCBzbyB3ZSBydW4NCj4gPiA+ICsJICogcXVldWVzIGZv
ciBkaXNwYXRjaGluZyB0aGVzZSByZXF1ZXN0cy4NCj4gPiA+ICsJICovDQo+ID4gPiArCWJsa19t
cV9zdGFydF9od19xdWV1ZXMocSk7DQo+ID4gPiAgfQ0KPiA+ID4gIEVYUE9SVF9TWU1CT0xfR1BM
KGJsa19tcV91bnF1aWVzY2VfcXVldWUpOw0KPiA+IA0KPiA+IFRoaXMgbG9va3MgcmVhbGx5IHdl
aXJkIHRvIG1lLiBJZiBibGtfbXFfcXVpZXNjZV9xdWV1ZSgpIGRvZXNuJ3Qgc3RvcCBhbnkNCj4g
PiBoYXJkd2FyZSBxdWV1ZXMsIHdoeSBzaG91bGQgYmxrX21xX3VucXVpZXNjZV9xdWV1ZSgpIHN0
YXJ0IGFueSBoYXJkd2FyZQ0KPiA+IHF1ZXVlcz8NCj4gDQo+IFBsZWFzZSBzZWUgdGhlIGFib3Zl
IGNvbW1lbnQsIGVzcGVjaWFsbHkgaW4gZGlyZWN0IGlzc3VlIHBhdGgsIHdlIGhhdmUgdG8NCj4g
aW5zZXJ0IHRoZSByZXF1ZXN0IGludG8gc3cvc2NoZWR1bGVyIHF1ZXVlIHdoZW4gcXVldWUgaXMg
cXVpZXNjZWQsDQo+IGFuZCB0aGVzZSBxdWV1ZWQgcmVxdWVzdHMgaGF2ZSB0byBiZSBkaXNwYXRj
aGVkIG91dCBkdXJpbmcgdW5xdWllc2NpbmcuDQo+IA0KPiBJIGNvbnNpZGVyZWQgdG8gc2xlZXAg
dGhlIGN1cnJlbnQgY29udGV4dCB1bmRlciB0aGlzIHNpdHVhdGlvbiwgYnV0DQo+IHR1cm5zIG91
dCBpdCBpcyBtb3JlIGNvbXBsaWNhdGVkIHRvIGhhbmRsZSwgZ2l2ZW4gd2UgaGF2ZSB2ZXJ5IGxp
bWl0ZWQNCj4gcXVldWUgZGVwdGgsIGp1c3QgbGV0IGFwcGxpY2F0aW9ucyBjb25zdW1lcyBhbGws
IHRoZW4gd2FpdC4NCg0KSGVsbG8gTWluZywNCg0KVGhlIGFib3ZlIHNlZW1zIHRvIG1lIHRvIGJl
IGFuIGFyZ3VtZW50IHRvICpydW4qIHRoZSBxdWV1ZSBmcm9tIGluc2lkZQ0KYmxrX21xX3VucXVp
ZXNjZV9xdWV1ZSgpIGluc3RlYWQgb2YgKnN0YXJ0aW5nKiBzdG9wcGVkIHF1ZXVlcyBmcm9tIGlu
c2lkZQ0KdGhhdCBmdW5jdGlvbi4NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
  2017-05-28 10:44     ` Ming Lei
@ 2017-05-28 16:10       ` Bart Van Assche
  2017-05-30  0:22         ` Ming Lei
  2017-05-30 19:23       ` Bart Van Assche
  1 sibling, 1 reply; 42+ messages in thread
From: Bart Van Assche @ 2017-05-28 16:10 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, axboe

T24gU3VuLCAyMDE3LTA1LTI4IGF0IDE4OjQ0ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
U2F0LCBNYXkgMjcsIDIwMTcgYXQgMDk6NDY6NDVQTSArMDAwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+IE9uIFNhdCwgMjAxNy0wNS0yNyBhdCAyMjoyMSArMDgwMCwgTWluZyBMZWkgd3Jv
dGU6DQo+ID4gPiAgYm9vbCBibGtfbXFfY2FuX3F1ZXVlKHN0cnVjdCBibGtfbXFfaHdfY3R4ICpo
Y3R4KQ0KPiA+ID4gQEAgLTExMDgsMTMgKzExMTksMTUgQEAgc3RhdGljIHZvaWQgX19ibGtfbXFf
cnVuX2h3X3F1ZXVlKHN0cnVjdCBibGtfbXFfaHdfY3R4ICpoY3R4KQ0KPiA+ID4gIA0KPiA+ID4g
IAlpZiAoIShoY3R4LT5mbGFncyAmIEJMS19NUV9GX0JMT0NLSU5HKSkgew0KPiA+ID4gIAkJcmN1
X3JlYWRfbG9jaygpOw0KPiA+ID4gLQkJYmxrX21xX3NjaGVkX2Rpc3BhdGNoX3JlcXVlc3RzKGhj
dHgpOw0KPiA+ID4gKwkJaWYgKCFibGtfcXVldWVfcXVpZXNjZWQoaGN0eC0+cXVldWUpKQ0KPiA+
ID4gKwkJCWJsa19tcV9zY2hlZF9kaXNwYXRjaF9yZXF1ZXN0cyhoY3R4KTsNCj4gPiA+ICAJCXJj
dV9yZWFkX3VubG9jaygpOw0KPiA+ID4gIAl9IGVsc2Ugew0KPiA+ID4gIAkJbWlnaHRfc2xlZXAo
KTsNCj4gPiA+ICANCj4gPiA+ICAJCXNyY3VfaWR4ID0gc3JjdV9yZWFkX2xvY2soJmhjdHgtPnF1
ZXVlX3JxX3NyY3UpOw0KPiA+ID4gLQkJYmxrX21xX3NjaGVkX2Rpc3BhdGNoX3JlcXVlc3RzKGhj
dHgpOw0KPiA+ID4gKwkJaWYgKCFibGtfcXVldWVfcXVpZXNjZWQoaGN0eC0+cXVldWUpKQ0KPiA+
ID4gKwkJCWJsa19tcV9zY2hlZF9kaXNwYXRjaF9yZXF1ZXN0cyhoY3R4KTsNCj4gPiA+ICAJCXNy
Y3VfcmVhZF91bmxvY2soJmhjdHgtPnF1ZXVlX3JxX3NyY3UsIHNyY3VfaWR4KTsNCj4gPiA+ICAJ
fQ0KPiA+ID4gIH0NCj4gPiANCj4gPiBTb3JyeSBidXQgSSBkb24ndCBsaWtlIHRoZXNlIGNoYW5n
ZXMuIFdoeSBoYXZlIHRoZSBibGtfcXVldWVfcXVpZXNjZWQoKQ0KPiA+IGNhbGxzIGJlIGFkZGVk
IGF0IG90aGVyIGNvZGUgbG9jYXRpb25zIHRoYW4gdGhlIGJsa19tcV9oY3R4X3N0b3BwZWQoKSBj
YWxscz8NCj4gPiBUaGlzIHdpbGwgbWFrZSB0aGUgYmxvY2sgbGF5ZXIgdW5uZWNlc3NhcnkgaGFy
ZCB0byBtYWludGFpbi4gUGxlYXNlIGNvbnNpZGVyDQo+ID4gdG8gY2hhbmdlIHRoZSBibGtfbXFf
aGN0eF9zdG9wcGVkKGhjdHgpIGNhbGxzIGluIGJsa19tcV9zY2hlZF9kaXNwYXRjaF9yZXF1ZXN0
cygpDQo+ID4gYW5kICpibGtfbXFfKnJ1bl9od19xdWV1ZSooKSBpbnRvIGJsa19tcV9oY3R4X3N0
b3BwZWQoaGN0eCkgfHwgYmxrX3F1ZXVlX3F1aWVzY2VkKHEpLg0KPiANCj4gT25lIGJlbmVmaXQg
aXMgdGhhdCB3ZSBtYWtlIGl0IGV4cGxpY2l0IHRoYXQgdGhlIGZsYWcgaGFzIHRvIGJlIGNoZWNr
ZWQNCj4gaW5zaWRlIHRoZSBSQ1UgcmVhZC1zaWRlIGNyaXRpY2FsIHNlY3Rpb25zLiBJZiB5b3Ug
cHV0IGl0IHNvbWV3aGVyZSwNCj4gc29tZW9uZSBtYXkgcHV0IGl0IG91dCBvZiByZWFkLXNpZGUg
Y3JpdGljYWwgc2VjdGlvbnMgaW4gZnV0dXJlLg0KDQpIZWxsbyBNaW5nLA0KDQpJIHJlYWxseSB3
b3VsZCBsaWtlIHRvIHNlZSB0aGUgYmxrX3F1ZXVlX3F1aWVzY2VkKCkgdGVzdHMgYXMgY2xvc2Ug
YXMgcG9zc2libGUgdG8NCnRoZSBibGtfbXFfaGN0eF9zdG9wcGVkKCkgdGVzdHMuIEJ1dCBJIGFn
cmVlIHRoYXQgd2UgbmVlZCBhIHdheSB0byBkb2N1bWVudCBhbmQvb3INCnZlcmlmeSB0aGF0IHRo
ZXNlIHRlc3RzIG9jY3VyIHdpdGggYW4gUkNVIHJlYWQtc2lkZSBsb2NrIGhlbGQuIEhhdmUgeW91
IGNvbnNpZGVyZWQNCnRvIHVzZSByY3VfcmVhZF9sb2NrX2hlbGQoKSB0byBkb2N1bWVudCB0aGlz
Pw0KDQpUaGFua3MsDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
  2017-05-28 16:10       ` Bart Van Assche
@ 2017-05-30  0:22         ` Ming Lei
  2017-05-30 16:54           ` Bart Van Assche
  0 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2017-05-30  0:22 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe

On Sun, May 28, 2017 at 04:10:09PM +0000, Bart Van Assche wrote:
> On Sun, 2017-05-28 at 18:44 +0800, Ming Lei wrote:
> > On Sat, May 27, 2017 at 09:46:45PM +0000, Bart Van Assche wrote:
> > > On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> > > >  bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
> > > > @@ -1108,13 +1119,15 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
> > > >  
> > > >  	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
> > > >  		rcu_read_lock();
> > > > -		blk_mq_sched_dispatch_requests(hctx);
> > > > +		if (!blk_queue_quiesced(hctx->queue))
> > > > +			blk_mq_sched_dispatch_requests(hctx);
> > > >  		rcu_read_unlock();
> > > >  	} else {
> > > >  		might_sleep();
> > > >  
> > > >  		srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
> > > > -		blk_mq_sched_dispatch_requests(hctx);
> > > > +		if (!blk_queue_quiesced(hctx->queue))
> > > > +			blk_mq_sched_dispatch_requests(hctx);
> > > >  		srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
> > > >  	}
> > > >  }
> > > 
> > > Sorry but I don't like these changes. Why have the blk_queue_quiesced()
> > > calls be added at other code locations than the blk_mq_hctx_stopped() calls?
> > > This will make the block layer unnecessary hard to maintain. Please consider
> > > to change the blk_mq_hctx_stopped(hctx) calls in blk_mq_sched_dispatch_requests()
> > > and *blk_mq_*run_hw_queue*() into blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q).
> > 
> > One benefit is that we make it explicit that the flag has to be checked
> > inside the RCU read-side critical sections. If you put it somewhere,
> > someone may put it out of read-side critical sections in future.
> 
> Hello Ming,
> 
> I really would like to see the blk_queue_quiesced() tests as close as possible to
> the blk_mq_hctx_stopped() tests. But I agree that we need a way to document and/or

Could you explain why we have to do that? And checking on stopped state
doesn't need to hold RCU/SRCU read lock, and that two states are really
different.

> verify that these tests occur with an RCU read-side lock held. Have you considered
> to use rcu_read_lock_held() to document this?

Then we need to check if it is RCU or SRCU, and make code ugly as
current check on BLOCKING.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 6/8] blk-mq: don't stop queue for quiescing
  2017-05-28 16:03       ` Bart Van Assche
@ 2017-05-30  0:27         ` Ming Lei
  2017-05-30 17:02           ` Bart Van Assche
  0 siblings, 1 reply; 42+ messages in thread
From: Ming Lei @ 2017-05-30  0:27 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe

On Sun, May 28, 2017 at 04:03:20PM +0000, Bart Van Assche wrote:
> On Sun, 2017-05-28 at 18:50 +0800, Ming Lei wrote:
> > On Sat, May 27, 2017 at 09:49:27PM +0000, Bart Van Assche wrote:
> > > On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> > > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > > index 032045841856..84cce67caee3 100644
> > > > --- a/block/blk-mq.c
> > > > +++ b/block/blk-mq.c
> > > > @@ -168,8 +168,6 @@ void blk_mq_quiesce_queue(struct request_queue *q)
> > > >  	unsigned int i;
> > > >  	bool rcu = false;
> > > >  
> > > > -	__blk_mq_stop_hw_queues(q, true);
> > > > -
> > > >  	spin_lock_irq(q->queue_lock);
> > > >  	queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> > > >  	spin_unlock_irq(q->queue_lock);
> > > > @@ -198,7 +196,12 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
> > > >  	queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> > > >  	spin_unlock_irq(q->queue_lock);
> > > >  
> > > > -	blk_mq_start_stopped_hw_queues(q, true);
> > > > +	/*
> > > > +	 * During quiescing, requests can be inserted
> > > > +	 * to scheduler's queue or sw queue, so we run
> > > > +	 * queues for dispatching these requests.
> > > > +	 */
> > > > +	blk_mq_start_hw_queues(q);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
> > > 
> > > This looks really weird to me. If blk_mq_quiesce_queue() doesn't stop any
> > > hardware queues, why should blk_mq_unquiesce_queue() start any hardware
> > > queues?
> > 
> > Please see the above comment, especially in direct issue path, we have to
> > insert the request into sw/scheduler queue when queue is quiesced,
> > and these queued requests have to be dispatched out during unquiescing.
> > 
> > I considered to sleep the current context under this situation, but
> > turns out it is more complicated to handle, given we have very limited
> > queue depth, just let applications consumes all, then wait.
> 
> Hello Ming,
> 
> The above seems to me to be an argument to *run* the queue from inside
> blk_mq_unquiesce_queue() instead of *starting* stopped queues from inside
> that function.

Yes, it is run queues, as you can see in my comment.

The reality is that we can't run queue without clearing the STOPPED state.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/8] blk-mq: fix & improve queue quiescing
  2017-05-28 16:01     ` Bart Van Assche
@ 2017-05-30  0:34       ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2017-05-30  0:34 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe

On Sun, May 28, 2017 at 04:01:13PM +0000, Bart Van Assche wrote:
> On Sun, 2017-05-28 at 19:11 +0800, Ming Lei wrote:
> > On Sat, May 27, 2017 at 09:32:02PM +0000, Bart Van Assche wrote:
> > > On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> > > > There are some issues in current blk_mq_quiesce_queue():
> > > > 
> > > >     - in case of direct issue or BLK_MQ_S_START_ON_RUN, dispatch won't
> > > >     be prevented after blk_mq_quiesce_queue() is returned.
> > > >     - in theory, new RCU read-side critical sections may begin while
> > > >     synchronize_rcu() was waiting, and end after returning from 
> > > >     synchronize_rcu(), then dispatch still may be run after
> > > >     synchronize_rcu() returns
> > > 
> > > I disagree with the second part of your statement. blk_mq_quiesce_queue()
> > 
> > You can find the similar description from line 158 to line 180.
> > of Documentation/RCU/whatisRCU.txt.
> > 
> > For example, there may be schedule preempt or irq handling between
> > checking stopped flag(false) and the RCU read-side critical sections, and
> > synchronize_*rcu() may return before running this RCU read-side critical
> > sections. That is why we have to move the check into RCU read-side
> > critical sections.
> 
> Hello Ming,
> 
> Have you noticed that in the current blk-mq code the test of the hctx
> stopped flag already occurs *inside* an RCU read-side critical section?
> __blk_mq_run_hw_queue() grabs an RCU read lock before
> blk_mq_sched_dispatch_requests() calls blk_mq_hctx_stopped().

OK, that is what I missed, will update the commit log in next post.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/8] blk-mq: introduce blk_mq_unquiesce_queue
  2017-05-27 14:21 ` [PATCH v2 1/8] blk-mq: introduce blk_mq_unquiesce_queue Ming Lei
@ 2017-05-30 15:09   ` Bart Van Assche
  0 siblings, 0 replies; 42+ messages in thread
From: Bart Van Assche @ 2017-05-30 15:09 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei

On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> + * blk_mq_unquiesce_queue() - pair of blk_mq_quiesce_queue()
                                 ^^^^=20

Sorry but that comment is incomprehensible to me. Did you perhaps mean
"counterpart" instead of "pair"?

Bart.=

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/8] block: introduce flag of QUEUE_FLAG_QUIESCED
  2017-05-27 14:21 ` [PATCH v2 2/8] block: introduce flag of QUEUE_FLAG_QUIESCED Ming Lei
@ 2017-05-30 15:11   ` Bart Van Assche
  0 siblings, 0 replies; 42+ messages in thread
From: Bart Van Assche @ 2017-05-30 15:11 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei

On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> This flag is introduced for fixing & improving the quiescing code.

Hello Ming,

Since the quiescing code works fine, a better description would be that the
purpose of this flag is to allow block drivers to use both the queue stoppi=
ng
and queue quiescing mechanisms instead of only one of these two mechanisms.

Bart.=

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/8] blk-mq: use the introduced blk_mq_unquiesce_queue()
  2017-05-27 14:21   ` Ming Lei
  (?)
@ 2017-05-30 15:12     ` Bart Van Assche
  -1 siblings, 0 replies; 42+ messages in thread
From: Bart Van Assche @ 2017-05-30 15:12 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei; +Cc: linux-scsi, dm-devel, linux-nvme

On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -3030,7 +3030,10 @@ scsi_internal_device_unblock(struct scsi_device *s=
dev,
>  		return -EINVAL;
> =20
>  	if (q->mq_ops) {
> -		blk_mq_start_stopped_hw_queues(q, false);
> +		if (blk_queue_quiesced(q))
> +			blk_mq_unquiesce_queue(q);
> +		else
> +			blk_mq_start_stopped_hw_queues(q, false);
>  	} else {
>  		spin_lock_irqsave(q->queue_lock, flags);
>  		blk_start_queue(q);

Hello Ming,

Sorry but that change looks wrong to me. All what's needed here is a call
to blk_mq_unquiesce_queue().

Bart.=

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/8] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-05-30 15:12     ` Bart Van Assche
  0 siblings, 0 replies; 42+ messages in thread
From: Bart Van Assche @ 2017-05-30 15:12 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei; +Cc: linux-scsi, dm-devel, linux-nvme

On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -3030,7 +3030,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
>  		return -EINVAL;
>  
>  	if (q->mq_ops) {
> -		blk_mq_start_stopped_hw_queues(q, false);
> +		if (blk_queue_quiesced(q))
> +			blk_mq_unquiesce_queue(q);
> +		else
> +			blk_mq_start_stopped_hw_queues(q, false);
>  	} else {
>  		spin_lock_irqsave(q->queue_lock, flags);
>  		blk_start_queue(q);

Hello Ming,

Sorry but that change looks wrong to me. All what's needed here is a call
to blk_mq_unquiesce_queue().

Bart.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 3/8] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-05-30 15:12     ` Bart Van Assche
  0 siblings, 0 replies; 42+ messages in thread
From: Bart Van Assche @ 2017-05-30 15:12 UTC (permalink / raw)


On Sat, 2017-05-27@22:21 +0800, Ming Lei wrote:
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -3030,7 +3030,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
>  		return -EINVAL;
>  
>  	if (q->mq_ops) {
> -		blk_mq_start_stopped_hw_queues(q, false);
> +		if (blk_queue_quiesced(q))
> +			blk_mq_unquiesce_queue(q);
> +		else
> +			blk_mq_start_stopped_hw_queues(q, false);
>  	} else {
>  		spin_lock_irqsave(q->queue_lock, flags);
>  		blk_start_queue(q);

Hello Ming,

Sorry but that change looks wrong to me. All what's needed here is a call
to blk_mq_unquiesce_queue().

Bart.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
  2017-05-30  0:22         ` Ming Lei
@ 2017-05-30 16:54           ` Bart Van Assche
  2017-05-31  2:38             ` Ming Lei
  0 siblings, 1 reply; 42+ messages in thread
From: Bart Van Assche @ 2017-05-30 16:54 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, axboe

On Tue, 2017-05-30 at 08:22 +0800, Ming Lei wrote:
> On Sun, May 28, 2017 at 04:10:09PM +0000, Bart Van Assche wrote:
> > I really would like to see the blk_queue_quiesced() tests as close as p=
ossible to
> > the blk_mq_hctx_stopped() tests. But I agree that we need a way to docu=
ment and/or
>=20
> Could you explain why we have to do that? And checking on stopped state
> doesn't need to hold RCU/SRCU read lock, and that two states are really
> different.

I'm really surprised that you ask me why ... It's because the purpose of th=
e
"stopped" and "quiesced" flags is similar, namely preventing that dispatchi=
ng
requests happens. It doesn't matter that with your patches applied it is no
longer needed to hold an RCU / SRCU lock when testing the stopped flag.

> > verify that these tests occur with an RCU read-side lock held. Have you=
 considered
> > to use rcu_read_lock_held() to document this?
>=20
> Then we need to check if it is RCU or SRCU, and make code ugly as
> current check on BLOCKING.

How about introducing a macro or inline function in the block layer that te=
sts
whether either the RCU read lock or an SRCU read lock is held depending on =
the
value of the BLK_MQ_F_BLOCKING flag?

Bart.=

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 6/8] blk-mq: don't stop queue for quiescing
  2017-05-30  0:27         ` Ming Lei
@ 2017-05-30 17:02           ` Bart Van Assche
  2017-05-31  2:55             ` Ming Lei
  0 siblings, 1 reply; 42+ messages in thread
From: Bart Van Assche @ 2017-05-30 17:02 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, axboe

On Tue, 2017-05-30 at 08:27 +0800, Ming Lei wrote:
> On Sun, May 28, 2017 at 04:03:20PM +0000, Bart Van Assche wrote:
> > On Sun, 2017-05-28 at 18:50 +0800, Ming Lei wrote:
> > > On Sat, May 27, 2017 at 09:49:27PM +0000, Bart Van Assche wrote:
> > > > On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > > > index 032045841856..84cce67caee3 100644
> > > > > --- a/block/blk-mq.c
> > > > > +++ b/block/blk-mq.c
> > > > > @@ -168,8 +168,6 @@ void blk_mq_quiesce_queue(struct request_queu=
e *q)
> > > > >  	unsigned int i;
> > > > >  	bool rcu =3D false;
> > > > > =20
> > > > > -	__blk_mq_stop_hw_queues(q, true);
> > > > > -
> > > > >  	spin_lock_irq(q->queue_lock);
> > > > >  	queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> > > > >  	spin_unlock_irq(q->queue_lock);
> > > > > @@ -198,7 +196,12 @@ void blk_mq_unquiesce_queue(struct request_q=
ueue *q)
> > > > >  	queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> > > > >  	spin_unlock_irq(q->queue_lock);
> > > > > =20
> > > > > -	blk_mq_start_stopped_hw_queues(q, true);
> > > > > +	/*
> > > > > +	 * During quiescing, requests can be inserted
> > > > > +	 * to scheduler's queue or sw queue, so we run
> > > > > +	 * queues for dispatching these requests.
> > > > > +	 */
> > > > > +	blk_mq_start_hw_queues(q);
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
> > > >=20
> > > > This looks really weird to me. If blk_mq_quiesce_queue() doesn't st=
op any
> > > > hardware queues, why should blk_mq_unquiesce_queue() start any hard=
ware
> > > > queues?
> > >=20
> > > Please see the above comment, especially in direct issue path, we hav=
e to
> > > insert the request into sw/scheduler queue when queue is quiesced,
> > > and these queued requests have to be dispatched out during unquiescin=
g.
> > >=20
> > > I considered to sleep the current context under this situation, but
> > > turns out it is more complicated to handle, given we have very limite=
d
> > > queue depth, just let applications consumes all, then wait.
> >=20
> > Hello Ming,
> >=20
> > The above seems to me to be an argument to *run* the queue from inside
> > blk_mq_unquiesce_queue() instead of *starting* stopped queues from insi=
de
> > that function.
>=20
> Yes, it is run queues, as you can see in my comment.
>=20
> The reality is that we can't run queue without clearing the STOPPED state=
.

Hello Ming,

I think it's completely wrong to make blk_mq_unquiesce_queue() to clear the
STOPPED state. Stopping a queue is typically used to tell the block layer t=
o
stop dispatching requests and not to resume dispatching requests until the
STOPPED flag is cleared. If stopping and quiescing a queue happen
simultaneously then dispatching should not occur before both
blk_mq_unquiesce_queue() and blk_mq_start_hw_queue() have been called. Your
patch would make dispatching start too early.

Bart.=

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/8] blk-mq: update comments on blk_mq_quiesce_queue()
  2017-05-27 14:21 ` [PATCH v2 5/8] blk-mq: update comments on blk_mq_quiesce_queue() Ming Lei
@ 2017-05-30 17:14   ` Bart Van Assche
  2017-05-31  9:51     ` Ming Lei
  0 siblings, 1 reply; 42+ messages in thread
From: Bart Van Assche @ 2017-05-30 17:14 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei

On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
>  /**
> - * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have f=
inished
> + * blk_mq_quiesce_queue() - wait until all ongoing dispatching have fini=
shed
>   * @q: request queue.
>   *

Hello Ming,

The concept of dispatching does not have a meaning to block driver authors =
who are
not familiar with the block layer internals. However, every author of a blk=
-mq driver
knows what the .queue_rq() function is. Additionally, the new comment is gr=
ammatically
incorrect. So the above change looks like a step in the wrong direction to =
me.

Bart.=

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/8] blk-mq: use the introduced blk_mq_unquiesce_queue()
  2017-05-27 14:21   ` Ming Lei
  (?)
@ 2017-05-30 19:04     ` Eduardo Valentin
  -1 siblings, 0 replies; 42+ messages in thread
From: Eduardo Valentin @ 2017-05-30 19:04 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-nvme, linux-scsi, dm-devel

On Sat, May 27, 2017 at 10:21:21PM +0800, Ming Lei wrote:
> blk_mq_unquiesce_queue() is used for unquiescing the
> queue explicitly, so replace blk_mq_start_stopped_hw_queues()
> with it.
> 
> Cc: linux-nvme@lists.infradead.org
> Cc: linux-scsi@vger.kernel.org
> Cc: dm-devel@redhat.com
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/md/dm-rq.c       | 2 +-
>  drivers/nvme/host/core.c | 2 +-
>  drivers/scsi/scsi_lib.c  | 5 ++++-
>  3 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index 2af27026aa2e..673fcf075077 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -71,7 +71,7 @@ static void dm_old_start_queue(struct request_queue *q)
>  
>  static void dm_mq_start_queue(struct request_queue *q)
>  {
> -	blk_mq_start_stopped_hw_queues(q, true);
> +	blk_mq_unquiesce_queue(q);
>  	blk_mq_kick_requeue_list(q);
>  }
>  
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 04e115834702..231d36028afc 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2514,7 +2514,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
>  
>  	mutex_lock(&ctrl->namespaces_mutex);
>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
> -		blk_mq_start_stopped_hw_queues(ns->queue, true);
> +		blk_mq_unquiesce_queue(ns->queue);
>  		blk_mq_kick_requeue_list(ns->queue);
>  	}
>  	mutex_unlock(&ctrl->namespaces_mutex);
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 814a4bd8405d..72b11f75719c 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -3030,7 +3030,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
>  		return -EINVAL;
>  
>  	if (q->mq_ops) {
> -		blk_mq_start_stopped_hw_queues(q, false);
> +		if (blk_queue_quiesced(q))
> +			blk_mq_unquiesce_queue(q);

Calling this here, at this point means:
		blk_mq_start_stopped_hw_queues(q, true);

Does it make a difference, given that before the code always calling 
		blk_mq_start_stopped_hw_queues(q, false);


> +		else
> +			blk_mq_start_stopped_hw_queues(q, false);

Why do you need to care about the case of !blk_queue_quiesced(q)?

>  	} else {
>  		spin_lock_irqsave(q->queue_lock, flags);
>  		blk_start_queue(q);
> -- 
> 2.9.4
> 
> 

-- 
All the best,
Eduardo Valentin

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/8] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-05-30 19:04     ` Eduardo Valentin
  0 siblings, 0 replies; 42+ messages in thread
From: Eduardo Valentin @ 2017-05-30 19:04 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-nvme, linux-scsi, dm-devel

On Sat, May 27, 2017 at 10:21:21PM +0800, Ming Lei wrote:
> blk_mq_unquiesce_queue() is used for unquiescing the
> queue explicitly, so replace blk_mq_start_stopped_hw_queues()
> with it.
> 
> Cc: linux-nvme@lists.infradead.org
> Cc: linux-scsi@vger.kernel.org
> Cc: dm-devel@redhat.com
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/md/dm-rq.c       | 2 +-
>  drivers/nvme/host/core.c | 2 +-
>  drivers/scsi/scsi_lib.c  | 5 ++++-
>  3 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index 2af27026aa2e..673fcf075077 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -71,7 +71,7 @@ static void dm_old_start_queue(struct request_queue *q)
>  
>  static void dm_mq_start_queue(struct request_queue *q)
>  {
> -	blk_mq_start_stopped_hw_queues(q, true);
> +	blk_mq_unquiesce_queue(q);
>  	blk_mq_kick_requeue_list(q);
>  }
>  
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 04e115834702..231d36028afc 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2514,7 +2514,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
>  
>  	mutex_lock(&ctrl->namespaces_mutex);
>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
> -		blk_mq_start_stopped_hw_queues(ns->queue, true);
> +		blk_mq_unquiesce_queue(ns->queue);
>  		blk_mq_kick_requeue_list(ns->queue);
>  	}
>  	mutex_unlock(&ctrl->namespaces_mutex);
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 814a4bd8405d..72b11f75719c 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -3030,7 +3030,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
>  		return -EINVAL;
>  
>  	if (q->mq_ops) {
> -		blk_mq_start_stopped_hw_queues(q, false);
> +		if (blk_queue_quiesced(q))
> +			blk_mq_unquiesce_queue(q);

Calling this here, at this point means:
		blk_mq_start_stopped_hw_queues(q, true);

Does it make a difference, given that before the code always calling 
		blk_mq_start_stopped_hw_queues(q, false);


> +		else
> +			blk_mq_start_stopped_hw_queues(q, false);

Why do you need to care about the case of !blk_queue_quiesced(q)?

>  	} else {
>  		spin_lock_irqsave(q->queue_lock, flags);
>  		blk_start_queue(q);
> -- 
> 2.9.4
> 
> 

-- 
All the best,
Eduardo Valentin

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 3/8] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-05-30 19:04     ` Eduardo Valentin
  0 siblings, 0 replies; 42+ messages in thread
From: Eduardo Valentin @ 2017-05-30 19:04 UTC (permalink / raw)


On Sat, May 27, 2017@10:21:21PM +0800, Ming Lei wrote:
> blk_mq_unquiesce_queue() is used for unquiescing the
> queue explicitly, so replace blk_mq_start_stopped_hw_queues()
> with it.
> 
> Cc: linux-nvme at lists.infradead.org
> Cc: linux-scsi at vger.kernel.org
> Cc: dm-devel at redhat.com
> Signed-off-by: Ming Lei <ming.lei at redhat.com>
> ---
>  drivers/md/dm-rq.c       | 2 +-
>  drivers/nvme/host/core.c | 2 +-
>  drivers/scsi/scsi_lib.c  | 5 ++++-
>  3 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index 2af27026aa2e..673fcf075077 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -71,7 +71,7 @@ static void dm_old_start_queue(struct request_queue *q)
>  
>  static void dm_mq_start_queue(struct request_queue *q)
>  {
> -	blk_mq_start_stopped_hw_queues(q, true);
> +	blk_mq_unquiesce_queue(q);
>  	blk_mq_kick_requeue_list(q);
>  }
>  
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 04e115834702..231d36028afc 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2514,7 +2514,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
>  
>  	mutex_lock(&ctrl->namespaces_mutex);
>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
> -		blk_mq_start_stopped_hw_queues(ns->queue, true);
> +		blk_mq_unquiesce_queue(ns->queue);
>  		blk_mq_kick_requeue_list(ns->queue);
>  	}
>  	mutex_unlock(&ctrl->namespaces_mutex);
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 814a4bd8405d..72b11f75719c 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -3030,7 +3030,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
>  		return -EINVAL;
>  
>  	if (q->mq_ops) {
> -		blk_mq_start_stopped_hw_queues(q, false);
> +		if (blk_queue_quiesced(q))
> +			blk_mq_unquiesce_queue(q);

Calling this here, at this point means:
		blk_mq_start_stopped_hw_queues(q, true);

Does it make a difference, given that before the code always calling 
		blk_mq_start_stopped_hw_queues(q, false);


> +		else
> +			blk_mq_start_stopped_hw_queues(q, false);

Why do you need to care about the case of !blk_queue_quiesced(q)?

>  	} else {
>  		spin_lock_irqsave(q->queue_lock, flags);
>  		blk_start_queue(q);
> -- 
> 2.9.4
> 
> 

-- 
All the best,
Eduardo Valentin

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
  2017-05-28 10:44     ` Ming Lei
  2017-05-28 16:10       ` Bart Van Assche
@ 2017-05-30 19:23       ` Bart Van Assche
  2017-05-31  2:52         ` Ming Lei
  1 sibling, 1 reply; 42+ messages in thread
From: Bart Van Assche @ 2017-05-30 19:23 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, axboe

On Sun, 2017-05-28 at 18:44 +0800, Ming Lei wrote:
> First it is really a fix, and then a improvement, so could you tell me
> where is wrong with the title and the description?

Hello Ming,

Can you explain me why you want to keep the=A0blk_mq_stop_hw_queues() call =
in
nvme_suspend_queue()? Since immediately after that call the NVMe driver sta=
rts
freeing resources, shouldn't that call be changed into a call of
blk_mq_quiesce_queue()? I think the same comment also applies to the
blk_mq_stop_hw_queues() calls in nvme_rdma_error_recovery_work() and
nvme_rdma_shutdown_ctrl().

Bart.=

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/8] blk-mq: use the introduced blk_mq_unquiesce_queue()
  2017-05-30 19:04     ` Eduardo Valentin
@ 2017-05-31  2:28       ` Ming Lei
  -1 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2017-05-31  2:28 UTC (permalink / raw)
  To: Eduardo Valentin
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-nvme, linux-scsi, dm-devel

On Tue, May 30, 2017 at 12:04:02PM -0700, Eduardo Valentin wrote:
> On Sat, May 27, 2017 at 10:21:21PM +0800, Ming Lei wrote:
> > blk_mq_unquiesce_queue() is used for unquiescing the
> > queue explicitly, so replace blk_mq_start_stopped_hw_queues()
> > with it.
> > 
> > Cc: linux-nvme@lists.infradead.org
> > Cc: linux-scsi@vger.kernel.org
> > Cc: dm-devel@redhat.com
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/md/dm-rq.c       | 2 +-
> >  drivers/nvme/host/core.c | 2 +-
> >  drivers/scsi/scsi_lib.c  | 5 ++++-
> >  3 files changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> > index 2af27026aa2e..673fcf075077 100644
> > --- a/drivers/md/dm-rq.c
> > +++ b/drivers/md/dm-rq.c
> > @@ -71,7 +71,7 @@ static void dm_old_start_queue(struct request_queue *q)
> >  
> >  static void dm_mq_start_queue(struct request_queue *q)
> >  {
> > -	blk_mq_start_stopped_hw_queues(q, true);
> > +	blk_mq_unquiesce_queue(q);
> >  	blk_mq_kick_requeue_list(q);
> >  }
> >  
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 04e115834702..231d36028afc 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -2514,7 +2514,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
> >  
> >  	mutex_lock(&ctrl->namespaces_mutex);
> >  	list_for_each_entry(ns, &ctrl->namespaces, list) {
> > -		blk_mq_start_stopped_hw_queues(ns->queue, true);
> > +		blk_mq_unquiesce_queue(ns->queue);
> >  		blk_mq_kick_requeue_list(ns->queue);
> >  	}
> >  	mutex_unlock(&ctrl->namespaces_mutex);
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 814a4bd8405d..72b11f75719c 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -3030,7 +3030,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
> >  		return -EINVAL;
> >  
> >  	if (q->mq_ops) {
> > -		blk_mq_start_stopped_hw_queues(q, false);
> > +		if (blk_queue_quiesced(q))
> > +			blk_mq_unquiesce_queue(q);
> 
> Calling this here, at this point means:
> 		blk_mq_start_stopped_hw_queues(q, true);
> 
> Does it make a difference, given that before the code always calling 
> 		blk_mq_start_stopped_hw_queues(q, false);

Good catch, it should have been:

		if (blk_queue_quiesced(q))
			blk_mq_unquiesce_queue(q);
		else
			blk_mq_start_stopped_hw_queues(q, false);

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 3/8] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-05-31  2:28       ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2017-05-31  2:28 UTC (permalink / raw)


On Tue, May 30, 2017@12:04:02PM -0700, Eduardo Valentin wrote:
> On Sat, May 27, 2017@10:21:21PM +0800, Ming Lei wrote:
> > blk_mq_unquiesce_queue() is used for unquiescing the
> > queue explicitly, so replace blk_mq_start_stopped_hw_queues()
> > with it.
> > 
> > Cc: linux-nvme at lists.infradead.org
> > Cc: linux-scsi at vger.kernel.org
> > Cc: dm-devel at redhat.com
> > Signed-off-by: Ming Lei <ming.lei at redhat.com>
> > ---
> >  drivers/md/dm-rq.c       | 2 +-
> >  drivers/nvme/host/core.c | 2 +-
> >  drivers/scsi/scsi_lib.c  | 5 ++++-
> >  3 files changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> > index 2af27026aa2e..673fcf075077 100644
> > --- a/drivers/md/dm-rq.c
> > +++ b/drivers/md/dm-rq.c
> > @@ -71,7 +71,7 @@ static void dm_old_start_queue(struct request_queue *q)
> >  
> >  static void dm_mq_start_queue(struct request_queue *q)
> >  {
> > -	blk_mq_start_stopped_hw_queues(q, true);
> > +	blk_mq_unquiesce_queue(q);
> >  	blk_mq_kick_requeue_list(q);
> >  }
> >  
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 04e115834702..231d36028afc 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -2514,7 +2514,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
> >  
> >  	mutex_lock(&ctrl->namespaces_mutex);
> >  	list_for_each_entry(ns, &ctrl->namespaces, list) {
> > -		blk_mq_start_stopped_hw_queues(ns->queue, true);
> > +		blk_mq_unquiesce_queue(ns->queue);
> >  		blk_mq_kick_requeue_list(ns->queue);
> >  	}
> >  	mutex_unlock(&ctrl->namespaces_mutex);
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 814a4bd8405d..72b11f75719c 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -3030,7 +3030,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
> >  		return -EINVAL;
> >  
> >  	if (q->mq_ops) {
> > -		blk_mq_start_stopped_hw_queues(q, false);
> > +		if (blk_queue_quiesced(q))
> > +			blk_mq_unquiesce_queue(q);
> 
> Calling this here, at this point means:
> 		blk_mq_start_stopped_hw_queues(q, true);
> 
> Does it make a difference, given that before the code always calling 
> 		blk_mq_start_stopped_hw_queues(q, false);

Good catch, it should have been:

		if (blk_queue_quiesced(q))
			blk_mq_unquiesce_queue(q);
		else
			blk_mq_start_stopped_hw_queues(q, false);

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/8] blk-mq: use the introduced blk_mq_unquiesce_queue()
  2017-05-30 15:12     ` Bart Van Assche
@ 2017-05-31  2:29       ` Ming Lei
  -1 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2017-05-31  2:29 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe, linux-scsi, dm-devel, linux-nvme

On Tue, May 30, 2017 at 03:12:41PM +0000, Bart Van Assche wrote:
> On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -3030,7 +3030,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
> >  		return -EINVAL;
> >  
> >  	if (q->mq_ops) {
> > -		blk_mq_start_stopped_hw_queues(q, false);
> > +		if (blk_queue_quiesced(q))
> > +			blk_mq_unquiesce_queue(q);
> > +		else
> > +			blk_mq_start_stopped_hw_queues(q, false);
> >  	} else {
> >  		spin_lock_irqsave(q->queue_lock, flags);
> >  		blk_start_queue(q);
> 
> Hello Ming,
> 
> Sorry but that change looks wrong to me. All what's needed here is a call
> to blk_mq_unquiesce_queue().

I think blk_mq_unquiesce_queue() should be called for case of queue
quiesced.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 3/8] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-05-31  2:29       ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2017-05-31  2:29 UTC (permalink / raw)


On Tue, May 30, 2017@03:12:41PM +0000, Bart Van Assche wrote:
> On Sat, 2017-05-27@22:21 +0800, Ming Lei wrote:
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -3030,7 +3030,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
> >  		return -EINVAL;
> >  
> >  	if (q->mq_ops) {
> > -		blk_mq_start_stopped_hw_queues(q, false);
> > +		if (blk_queue_quiesced(q))
> > +			blk_mq_unquiesce_queue(q);
> > +		else
> > +			blk_mq_start_stopped_hw_queues(q, false);
> >  	} else {
> >  		spin_lock_irqsave(q->queue_lock, flags);
> >  		blk_start_queue(q);
> 
> Hello Ming,
> 
> Sorry but that change looks wrong to me. All what's needed here is a call
> to blk_mq_unquiesce_queue().

I think blk_mq_unquiesce_queue() should be called for case of queue
quiesced.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
  2017-05-30 16:54           ` Bart Van Assche
@ 2017-05-31  2:38             ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2017-05-31  2:38 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe

On Tue, May 30, 2017 at 04:54:47PM +0000, Bart Van Assche wrote:
> On Tue, 2017-05-30 at 08:22 +0800, Ming Lei wrote:
> > On Sun, May 28, 2017 at 04:10:09PM +0000, Bart Van Assche wrote:
> > > I really would like to see the blk_queue_quiesced() tests as close as possible to
> > > the blk_mq_hctx_stopped() tests. But I agree that we need a way to document and/or
> > 
> > Could you explain why we have to do that? And checking on stopped state
> > doesn't need to hold RCU/SRCU read lock, and that two states are really
> > different.
> 
> I'm really surprised that you ask me why ... It's because the purpose of the
> "stopped" and "quiesced" flags is similar, namely preventing that dispatching

Actually they are not, and you can find it in patch 7.

But I will move the check into the dispatch function, and add comment about
rcu/scru read lock requirement.

> requests happens. It doesn't matter that with your patches applied it is no
> longer needed to hold an RCU / SRCU lock when testing the stopped flag.
> 
> > > verify that these tests occur with an RCU read-side lock held. Have you considered
> > > to use rcu_read_lock_held() to document this?
> > 
> > Then we need to check if it is RCU or SRCU, and make code ugly as
> > current check on BLOCKING.
> 
> How about introducing a macro or inline function in the block layer that tests
> whether either the RCU read lock or an SRCU read lock is held depending on the
> value of the BLK_MQ_F_BLOCKING flag?

That might make code clean, but need to check the condition two times.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue
  2017-05-30 19:23       ` Bart Van Assche
@ 2017-05-31  2:52         ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2017-05-31  2:52 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe

On Tue, May 30, 2017 at 07:23:31PM +0000, Bart Van Assche wrote:
> On Sun, 2017-05-28 at 18:44 +0800, Ming Lei wrote:
> > First it is really a fix, and then a improvement, so could you tell me
> > where is wrong with the title and the description?
> 
> Hello Ming,
> 
> Can you explain me why you want to keep the�blk_mq_stop_hw_queues() call in
> nvme_suspend_queue()? Since immediately after that call the NVMe driver starts
> freeing resources, shouldn't that call be changed into a call of
> blk_mq_quiesce_queue()? I think the same comment also applies to the
> blk_mq_stop_hw_queues() calls in nvme_rdma_error_recovery_work() and
> nvme_rdma_shutdown_ctrl().

Not only NVMe, also other suspend cases in virito-blk, dm, and ...

But that doesn't belong to this patchset, and definitely a follow up.
The big purpose of this patchset is to make blk_mq_quiesce_queue() easy to use,
correct and avoiding potential races.

Then we can fix cases of canceling requests, and other cases like you
mentioned.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 6/8] blk-mq: don't stop queue for quiescing
  2017-05-30 17:02           ` Bart Van Assche
@ 2017-05-31  2:55             ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2017-05-31  2:55 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe

On Tue, May 30, 2017 at 05:02:23PM +0000, Bart Van Assche wrote:
> On Tue, 2017-05-30 at 08:27 +0800, Ming Lei wrote:
> > On Sun, May 28, 2017 at 04:03:20PM +0000, Bart Van Assche wrote:
> > > On Sun, 2017-05-28 at 18:50 +0800, Ming Lei wrote:
> > > > On Sat, May 27, 2017 at 09:49:27PM +0000, Bart Van Assche wrote:
> > > > > On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> > > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > > > > index 032045841856..84cce67caee3 100644
> > > > > > --- a/block/blk-mq.c
> > > > > > +++ b/block/blk-mq.c
> > > > > > @@ -168,8 +168,6 @@ void blk_mq_quiesce_queue(struct request_queue *q)
> > > > > >  	unsigned int i;
> > > > > >  	bool rcu = false;
> > > > > >  
> > > > > > -	__blk_mq_stop_hw_queues(q, true);
> > > > > > -
> > > > > >  	spin_lock_irq(q->queue_lock);
> > > > > >  	queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> > > > > >  	spin_unlock_irq(q->queue_lock);
> > > > > > @@ -198,7 +196,12 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
> > > > > >  	queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> > > > > >  	spin_unlock_irq(q->queue_lock);
> > > > > >  
> > > > > > -	blk_mq_start_stopped_hw_queues(q, true);
> > > > > > +	/*
> > > > > > +	 * During quiescing, requests can be inserted
> > > > > > +	 * to scheduler's queue or sw queue, so we run
> > > > > > +	 * queues for dispatching these requests.
> > > > > > +	 */
> > > > > > +	blk_mq_start_hw_queues(q);
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
> > > > > 
> > > > > This looks really weird to me. If blk_mq_quiesce_queue() doesn't stop any
> > > > > hardware queues, why should blk_mq_unquiesce_queue() start any hardware
> > > > > queues?
> > > > 
> > > > Please see the above comment, especially in direct issue path, we have to
> > > > insert the request into sw/scheduler queue when queue is quiesced,
> > > > and these queued requests have to be dispatched out during unquiescing.
> > > > 
> > > > I considered to sleep the current context under this situation, but
> > > > turns out it is more complicated to handle, given we have very limited
> > > > queue depth, just let applications consumes all, then wait.
> > > 
> > > Hello Ming,
> > > 
> > > The above seems to me to be an argument to *run* the queue from inside
> > > blk_mq_unquiesce_queue() instead of *starting* stopped queues from inside
> > > that function.
> > 
> > Yes, it is run queues, as you can see in my comment.
> > 
> > The reality is that we can't run queue without clearing the STOPPED state.
> 
> Hello Ming,
> 
> I think it's completely wrong to make blk_mq_unquiesce_queue() to clear the
> STOPPED state. Stopping a queue is typically used to tell the block layer to
> stop dispatching requests and not to resume dispatching requests until the
> STOPPED flag is cleared. If stopping and quiescing a queue happen
> simultaneously then dispatching should not occur before both
> blk_mq_unquiesce_queue() and blk_mq_start_hw_queue() have been called. Your
> patch would make dispatching start too early.

Yes, I thought of it too, now we need to sleep the context of direct issue,
then restart in blk_mq_unquiesce_queue() can be avoided.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/8] blk-mq: update comments on blk_mq_quiesce_queue()
  2017-05-30 17:14   ` Bart Van Assche
@ 2017-05-31  9:51     ` Ming Lei
  0 siblings, 0 replies; 42+ messages in thread
From: Ming Lei @ 2017-05-31  9:51 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe

On Tue, May 30, 2017 at 05:14:44PM +0000, Bart Van Assche wrote:
> On Sat, 2017-05-27 at 22:21 +0800, Ming Lei wrote:
> >  /**
> > - * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
> > + * blk_mq_quiesce_queue() - wait until all ongoing dispatching have finished
> >   * @q: request queue.
> >   *
> 
> Hello Ming,
> 
> The concept of dispatching does not have a meaning to block driver authors who are
> not familiar with the block layer internals. However, every author of a blk-mq driver
> knows what the .queue_rq() function is.

Unfortunately it isn't enough to just block .queue_rq(), did you read the
commit log?

> Additionally, the new comment is grammatically
> incorrect.
> So the above change looks like a step in the wrong direction to me.

Sorry, I simply don't agree, and we have to make it explicit.

-- 
Ming

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2017-05-31  9:52 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-27 14:21 [PATCH v2 0/8] blk-mq: fix & improve queue quiescing Ming Lei
2017-05-27 14:21 ` [PATCH v2 1/8] blk-mq: introduce blk_mq_unquiesce_queue Ming Lei
2017-05-30 15:09   ` Bart Van Assche
2017-05-27 14:21 ` [PATCH v2 2/8] block: introduce flag of QUEUE_FLAG_QUIESCED Ming Lei
2017-05-30 15:11   ` Bart Van Assche
2017-05-27 14:21 ` [PATCH v2 3/8] blk-mq: use the introduced blk_mq_unquiesce_queue() Ming Lei
2017-05-27 14:21   ` Ming Lei
2017-05-30 15:12   ` Bart Van Assche
2017-05-30 15:12     ` Bart Van Assche
2017-05-30 15:12     ` Bart Van Assche
2017-05-31  2:29     ` Ming Lei
2017-05-31  2:29       ` Ming Lei
2017-05-30 19:04   ` Eduardo Valentin
2017-05-30 19:04     ` Eduardo Valentin
2017-05-30 19:04     ` Eduardo Valentin
2017-05-31  2:28     ` Ming Lei
2017-05-31  2:28       ` Ming Lei
2017-05-27 14:21 ` [PATCH v2 4/8] blk-mq: fix blk_mq_quiesce_queue Ming Lei
2017-05-27 21:46   ` Bart Van Assche
2017-05-28 10:44     ` Ming Lei
2017-05-28 16:10       ` Bart Van Assche
2017-05-30  0:22         ` Ming Lei
2017-05-30 16:54           ` Bart Van Assche
2017-05-31  2:38             ` Ming Lei
2017-05-30 19:23       ` Bart Van Assche
2017-05-31  2:52         ` Ming Lei
2017-05-27 14:21 ` [PATCH v2 5/8] blk-mq: update comments on blk_mq_quiesce_queue() Ming Lei
2017-05-30 17:14   ` Bart Van Assche
2017-05-31  9:51     ` Ming Lei
2017-05-27 14:21 ` [PATCH v2 6/8] blk-mq: don't stop queue for quiescing Ming Lei
2017-05-27 21:49   ` Bart Van Assche
2017-05-28 10:50     ` Ming Lei
2017-05-28 16:03       ` Bart Van Assche
2017-05-30  0:27         ` Ming Lei
2017-05-30 17:02           ` Bart Van Assche
2017-05-31  2:55             ` Ming Lei
2017-05-27 14:21 ` [PATCH v2 7/8] blk-mq: clarify dispatch may not be drained/blocked by stopping queue Ming Lei
2017-05-27 14:21 ` [PATCH v2 8/8] Revert "blk-mq: don't use sync workqueue flushing from drivers" Ming Lei
2017-05-27 21:32 ` [PATCH v2 0/8] blk-mq: fix & improve queue quiescing Bart Van Assche
2017-05-28 11:11   ` Ming Lei
2017-05-28 16:01     ` Bart Van Assche
2017-05-30  0:34       ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.