All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/8] blk-mq: fix & improve queue quiescing
@ 2017-05-31 12:36 Ming Lei
  2017-05-31 12:36 ` [PATCH v3 1/9] blk-mq: introduce blk_mq_unquiesce_queue Ming Lei
                   ` (8 more replies)
  0 siblings, 9 replies; 29+ messages in thread
From: Ming Lei @ 2017-05-31 12:36 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

There is one big issue in current blk_mq_quiesce_queue():

    - in case of direct issue or BLK_MQ_S_START_ON_RUN, dispatch won't
    be prevented after blk_mq_quiesce_queue() is returned.

It is observed that request double-free/use-after-free
can be triggered easily when canceling NVMe requests via
blk_mq_tagset_busy_iter(...nvme_cancel_request) in nvme_dev_disable().
The reason is that blk_mq_quiesce_queue() can't prevent
dispatching from being run during the period.

Actually we have to quiesce queue for canceling dispatched
requests via blk_mq_tagset_busy_iter(), otherwise use-after-free
can be made easily. This way of canceling dispatched requests
has been used in several drivers, only NVMe uses blk_mq_quiesce_queue()
to avoid the issue, and others need to be fixed too. And it
should be a common way for handling dead controller.

blk_mq_quiesce_queue() is implemented via stopping queue, which
limits its uses, and easy to cause race, because any queue restart in
other paths may break blk_mq_quiesce_queue(). For example, we sometimes
stops queue when hw can't handle too many ongoing requests and restarts
queue after requests are completed. Meantime when we want to cancel
requests if hardware is dead or suspend request is received, quiescing
has to be run first, then the restarting in complete path can break the
quiescing. This patch improves this interface via removing stopping queue,
then it can be easier to use.

V3:
	- wait until queue becomes unquiesced in direct issue path, so
	we can avoid to queue the current req into sw queue or scheduler
	queue, then the state of STOPPED needn't to be touched
	- move checking of !blk_queue_quiesced() into blk_mq_sched_dispatch_requests()
	as suggested by Bart
	- NVMe: unquiesce queue in nvme_kill_queues()
	- misc changes(fix grammer issue in commit log or comment, ...)

V2:
	- split patch "blk-mq: fix blk_mq_quiesce_queue" into two and
  	fix one build issue when only applying the 1st two patches.
	- add kernel oops and hang log into commit log
	- add 'Revert "blk-mq: don't use sync workqueue flushing from drivers"'

Ming Lei (9):
  blk-mq: introduce blk_mq_unquiesce_queue
  block: introduce flag of QUEUE_FLAG_QUIESCED
  blk-mq: use the introduced blk_mq_unquiesce_queue()
  nvme: host: unquiesce queue in nvme_kill_queues()
  blk-mq: fix blk_mq_quiesce_queue
  blk-mq: update comments on blk_mq_quiesce_queue()
  blk-mq: don't stop queue for quiescing
  blk-mq: clarify dispatch may not be drained/blocked by stopping queue
  Revert "blk-mq: don't use sync workqueue flushing from drivers"

 block/blk-mq-sched.c     |  3 +-
 block/blk-mq.c           | 78 ++++++++++++++++++++++++++++++++----------------
 drivers/md/dm-rq.c       |  2 +-
 drivers/nvme/host/core.c |  5 +++-
 drivers/scsi/scsi_lib.c  |  5 +++-
 include/linux/blkdev.h   |  6 ++++
 6 files changed, 70 insertions(+), 29 deletions(-)

-- 
2.9.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 1/9] blk-mq: introduce blk_mq_unquiesce_queue
  2017-05-31 12:36 [PATCH v3 0/8] blk-mq: fix & improve queue quiescing Ming Lei
@ 2017-05-31 12:36 ` Ming Lei
  2017-05-31 12:36 ` [PATCH v3 2/9] block: introduce flag of QUEUE_FLAG_QUIESCED Ming Lei
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-05-31 12:36 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

blk_mq_start_stopped_hw_queues() is used implictely
as counterpart of blk_mq_quiesce_queue() for unquiescing queue,
so we introduce blk_mq_unquiesce_queue() and make it
as counterpart of blk_mq_quiesce_queue() explicitely.

This function is for improving the current quiescing mechanism
in the following patches.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c         | 13 +++++++++++++
 include/linux/blkdev.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 22438d5036a3..2f0d80bad7be 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -181,6 +181,19 @@ void blk_mq_quiesce_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
 
+/*
+ * blk_mq_unquiesce_queue() - counterpart of blk_mq_quiesce_queue()
+ * @q: request queue.
+ *
+ * This function recovers queue into the state before quiescing
+ * which is done by blk_mq_quiesce_queue.
+ */
+void blk_mq_unquiesce_queue(struct request_queue *q)
+{
+	blk_mq_start_stopped_hw_queues(q, true);
+}
+EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ab92c4ea138b..41291be82ac4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -964,6 +964,7 @@ extern void __blk_run_queue_uncond(struct request_queue *q);
 extern void blk_run_queue(struct request_queue *);
 extern void blk_run_queue_async(struct request_queue *q);
 extern void blk_mq_quiesce_queue(struct request_queue *q);
+extern void blk_mq_unquiesce_queue(struct request_queue *q);
 extern int blk_rq_map_user(struct request_queue *, struct request *,
 			   struct rq_map_data *, void __user *, unsigned long,
 			   gfp_t);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 2/9] block: introduce flag of QUEUE_FLAG_QUIESCED
  2017-05-31 12:36 [PATCH v3 0/8] blk-mq: fix & improve queue quiescing Ming Lei
  2017-05-31 12:36 ` [PATCH v3 1/9] blk-mq: introduce blk_mq_unquiesce_queue Ming Lei
@ 2017-05-31 12:36 ` Ming Lei
  2017-05-31 12:37   ` Ming Lei
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-05-31 12:36 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

This flag is introduced for improving the quiescing code.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/blkdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 41291be82ac4..60967797f4f6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -618,6 +618,7 @@ struct request_queue {
 #define QUEUE_FLAG_STATS       27	/* track rq completion times */
 #define QUEUE_FLAG_POLL_STATS  28	/* collecting stats for hybrid polling */
 #define QUEUE_FLAG_REGISTERED  29	/* queue has been registered to a disk */
+#define QUEUE_FLAG_QUIESCED    30	/* queue has been quiesced */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -712,6 +713,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
 			     REQ_FAILFAST_DRIVER))
+#define blk_queue_quiesced(q)	test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
 
 static inline bool blk_account_rq(struct request *rq)
 {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue()
  2017-05-31 12:36 [PATCH v3 0/8] blk-mq: fix & improve queue quiescing Ming Lei
@ 2017-05-31 12:37   ` Ming Lei
  2017-05-31 12:36 ` [PATCH v3 2/9] block: introduce flag of QUEUE_FLAG_QUIESCED Ming Lei
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-05-31 12:37 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Ming Lei, linux-nvme, linux-scsi, dm-devel

blk_mq_unquiesce_queue() is used for unquiescing the
queue explicitly, so replace blk_mq_start_stopped_hw_queues()
with it.

Cc: linux-nvme@lists.infradead.org
Cc: linux-scsi@vger.kernel.org
Cc: dm-devel@redhat.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/dm-rq.c       | 2 +-
 drivers/nvme/host/core.c | 2 +-
 drivers/scsi/scsi_lib.c  | 5 ++++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index b639fa7246ee..ea4029de077f 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -71,7 +71,7 @@ static void dm_old_start_queue(struct request_queue *q)
 
 static void dm_mq_start_queue(struct request_queue *q)
 {
-	blk_mq_start_stopped_hw_queues(q, true);
+	blk_mq_unquiesce_queue(q);
 	blk_mq_kick_requeue_list(q);
 }
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a60926410438..c3f189e54d10 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2526,7 +2526,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		blk_mq_start_stopped_hw_queues(ns->queue, true);
+		blk_mq_unquiesce_queue(ns->queue);
 		blk_mq_kick_requeue_list(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 99e16ac479e3..ffcf05765e2b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3031,7 +3031,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
 		return -EINVAL;
 
 	if (q->mq_ops) {
-		blk_mq_start_stopped_hw_queues(q, false);
+		if (blk_queue_quiesced(q))
+			blk_mq_unquiesce_queue(q);
+		else
+			blk_mq_start_stopped_hw_queues(q, false);
 	} else {
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_start_queue(q);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-05-31 12:37   ` Ming Lei
  0 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-05-31 12:37 UTC (permalink / raw)


blk_mq_unquiesce_queue() is used for unquiescing the
queue explicitly, so replace blk_mq_start_stopped_hw_queues()
with it.

Cc: linux-nvme at lists.infradead.org
Cc: linux-scsi at vger.kernel.org
Cc: dm-devel at redhat.com
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 drivers/md/dm-rq.c       | 2 +-
 drivers/nvme/host/core.c | 2 +-
 drivers/scsi/scsi_lib.c  | 5 ++++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index b639fa7246ee..ea4029de077f 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -71,7 +71,7 @@ static void dm_old_start_queue(struct request_queue *q)
 
 static void dm_mq_start_queue(struct request_queue *q)
 {
-	blk_mq_start_stopped_hw_queues(q, true);
+	blk_mq_unquiesce_queue(q);
 	blk_mq_kick_requeue_list(q);
 }
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a60926410438..c3f189e54d10 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2526,7 +2526,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		blk_mq_start_stopped_hw_queues(ns->queue, true);
+		blk_mq_unquiesce_queue(ns->queue);
 		blk_mq_kick_requeue_list(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 99e16ac479e3..ffcf05765e2b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3031,7 +3031,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
 		return -EINVAL;
 
 	if (q->mq_ops) {
-		blk_mq_start_stopped_hw_queues(q, false);
+		if (blk_queue_quiesced(q))
+			blk_mq_unquiesce_queue(q);
+		else
+			blk_mq_start_stopped_hw_queues(q, false);
 	} else {
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_start_queue(q);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 4/9] nvme: host: unquiesce queue in nvme_kill_queues()
  2017-05-31 12:36 [PATCH v3 0/8] blk-mq: fix & improve queue quiescing Ming Lei
@ 2017-05-31 12:37   ` Ming Lei
  2017-05-31 12:36 ` [PATCH v3 2/9] block: introduce flag of QUEUE_FLAG_QUIESCED Ming Lei
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-05-31 12:37 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig
  Cc: Bart Van Assche, Ming Lei, linux-nvme

When nvme_kill_queues() is run, queues may be in
quiesced state, so we forcibly unquiesce queues to avoid
blocking dispatch, and I/O hang can be avoided in
remove path.

Peviously we use blk_mq_start_stopped_hw_queues() as
counterpart of blk_mq_quiesce_queue(), now we have
introduced blk_mq_unquiesce_queue(), so use it explicitly.

Cc: linux-nvme@lists.infradead.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c3f189e54d10..e44326d5cf19 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2448,6 +2448,9 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 		revalidate_disk(ns->disk);
 		blk_set_queue_dying(ns->queue);
 
+		/* Forcibly unquiesce queues to avoid blocking dispatch */
+		blk_mq_unquiesce_queue(ns->queue);
+
 		/*
 		 * Forcibly start all queues to avoid having stuck requests.
 		 * Note that we must ensure the queues are not stopped
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 4/9] nvme: host: unquiesce queue in nvme_kill_queues()
@ 2017-05-31 12:37   ` Ming Lei
  0 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-05-31 12:37 UTC (permalink / raw)


When nvme_kill_queues() is run, queues may be in
quiesced state, so we forcibly unquiesce queues to avoid
blocking dispatch, and I/O hang can be avoided in
remove path.

Peviously we use blk_mq_start_stopped_hw_queues() as
counterpart of blk_mq_quiesce_queue(), now we have
introduced blk_mq_unquiesce_queue(), so use it explicitly.

Cc: linux-nvme at lists.infradead.org
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 drivers/nvme/host/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c3f189e54d10..e44326d5cf19 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2448,6 +2448,9 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 		revalidate_disk(ns->disk);
 		blk_set_queue_dying(ns->queue);
 
+		/* Forcibly unquiesce queues to avoid blocking dispatch */
+		blk_mq_unquiesce_queue(ns->queue);
+
 		/*
 		 * Forcibly start all queues to avoid having stuck requests.
 		 * Note that we must ensure the queues are not stopped
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 5/9] blk-mq: fix blk_mq_quiesce_queue
  2017-05-31 12:36 [PATCH v3 0/8] blk-mq: fix & improve queue quiescing Ming Lei
                   ` (3 preceding siblings ...)
  2017-05-31 12:37   ` Ming Lei
@ 2017-05-31 12:37 ` Ming Lei
  2017-05-31 15:37   ` Bart Van Assche
  2017-05-31 12:37 ` [PATCH v3 6/9] blk-mq: update comments on blk_mq_quiesce_queue() Ming Lei
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Ming Lei @ 2017-05-31 12:37 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

It is required that no dispatch can happen any more once
blk_mq_quiesce_queue() returns, and we don't have such requirement
on APIs of stopping queue.

But blk_mq_quiesce_queue() still may not block/drain dispatch in the
following cases:

- direct issue
- BLK_MQ_S_START_ON_RUN

So use the new flag of QUEUE_FLAG_QUIESCED and evaluate it
inside RCU read-side critical sections for fixing the above issues.

This patch fixes request use-after-free[1][2] during canceling requets
of NVMe in nvme_dev_disable(), which can be triggered easily during
NVMe reset & remove test.

[1] oops kernel log when CONFIG_BLK_DEV_INTEGRITY is on
[  103.412969] BUG: unable to handle kernel NULL pointer dereference at 000000000000000a
[  103.412980] IP: bio_integrity_advance+0x48/0xf0
[  103.412981] PGD 275a88067
[  103.412981] P4D 275a88067
[  103.412982] PUD 276c43067
[  103.412983] PMD 0
[  103.412984]
[  103.412986] Oops: 0000 [#1] SMP
[  103.412989] Modules linked in: vfat fat intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd cryptd ipmi_ssif iTCO_wdt iTCO_vendor_support mxm_wmi glue_helper dcdbas ipmi_si mei_me pcspkr mei sg ipmi_devintf lpc_ich ipmi_msghandler shpchp acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel nvme ahci nvme_core libahci libata tg3 i2c_core megaraid_sas ptp pps_core dm_mirror dm_region_hash dm_log dm_mod
[  103.413035] CPU: 0 PID: 102 Comm: kworker/0:2 Not tainted 4.11.0+ #1
[  103.413036] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
[  103.413041] Workqueue: events nvme_remove_dead_ctrl_work [nvme]
[  103.413043] task: ffff9cc8775c8000 task.stack: ffffc033c252c000
[  103.413045] RIP: 0010:bio_integrity_advance+0x48/0xf0
[  103.413046] RSP: 0018:ffffc033c252fc10 EFLAGS: 00010202
[  103.413048] RAX: 0000000000000000 RBX: ffff9cc8720a8cc0 RCX: ffff9cca72958240
[  103.413049] RDX: ffff9cca72958000 RSI: 0000000000000008 RDI: ffff9cc872537f00
[  103.413049] RBP: ffffc033c252fc28 R08: 0000000000000000 R09: ffffffffb963a0d5
[  103.413050] R10: 000000000000063e R11: 0000000000000000 R12: ffff9cc8720a8d18
[  103.413051] R13: 0000000000001000 R14: ffff9cc872682e00 R15: 00000000fffffffb
[  103.413053] FS:  0000000000000000(0000) GS:ffff9cc877c00000(0000) knlGS:0000000000000000
[  103.413054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  103.413055] CR2: 000000000000000a CR3: 0000000276c41000 CR4: 00000000001406f0
[  103.413056] Call Trace:
[  103.413063]  bio_advance+0x2a/0xe0
[  103.413067]  blk_update_request+0x76/0x330
[  103.413072]  blk_mq_end_request+0x1a/0x70
[  103.413074]  blk_mq_dispatch_rq_list+0x370/0x410
[  103.413076]  ? blk_mq_flush_busy_ctxs+0x94/0xe0
[  103.413080]  blk_mq_sched_dispatch_requests+0x173/0x1a0
[  103.413083]  __blk_mq_run_hw_queue+0x8e/0xa0
[  103.413085]  __blk_mq_delay_run_hw_queue+0x9d/0xa0
[  103.413088]  blk_mq_start_hw_queue+0x17/0x20
[  103.413090]  blk_mq_start_hw_queues+0x32/0x50
[  103.413095]  nvme_kill_queues+0x54/0x80 [nvme_core]
[  103.413097]  nvme_remove_dead_ctrl_work+0x1f/0x40 [nvme]
[  103.413103]  process_one_work+0x149/0x360
[  103.413105]  worker_thread+0x4d/0x3c0
[  103.413109]  kthread+0x109/0x140
[  103.413111]  ? rescuer_thread+0x380/0x380
[  103.413113]  ? kthread_park+0x60/0x60
[  103.413120]  ret_from_fork+0x2c/0x40
[  103.413121] Code: 08 4c 8b 63 50 48 8b 80 80 00 00 00 48 8b 90 d0 03 00 00 31 c0 48 83 ba 40 02 00 00 00 48 8d 8a 40 02 00 00 48 0f 45 c1 c1 ee 09 <0f> b6 48 0a 0f b6 40 09 41 89 f5 83 e9 09 41 d3 ed 44 0f af e8
[  103.413145] RIP: bio_integrity_advance+0x48/0xf0 RSP: ffffc033c252fc10
[  103.413146] CR2: 000000000000000a
[  103.413157] ---[ end trace cd6875d16eb5a11e ]---
[  103.455368] Kernel panic - not syncing: Fatal exception
[  103.459826] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  103.850916] ---[ end Kernel panic - not syncing: Fatal exception
[  103.857637] sched: Unexpected reschedule of offline CPU#1!
[  103.863762] ------------[ cut here ]------------

[2] kernel hang in blk_mq_freeze_queue_wait() when CONFIG_BLK_DEV_INTEGRITY is off
[  247.129825] INFO: task nvme-test:1772 blocked for more than 120 seconds.
[  247.137311]       Not tainted 4.12.0-rc2.upstream+ #4
[  247.142954] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  247.151704] Call Trace:
[  247.154445]  __schedule+0x28a/0x880
[  247.158341]  schedule+0x36/0x80
[  247.161850]  blk_mq_freeze_queue_wait+0x4b/0xb0
[  247.166913]  ? remove_wait_queue+0x60/0x60
[  247.171485]  blk_freeze_queue+0x1a/0x20
[  247.175770]  blk_cleanup_queue+0x7f/0x140
[  247.180252]  nvme_ns_remove+0xa3/0xb0 [nvme_core]
[  247.185503]  nvme_remove_namespaces+0x32/0x50 [nvme_core]
[  247.191532]  nvme_uninit_ctrl+0x2d/0xa0 [nvme_core]
[  247.196977]  nvme_remove+0x70/0x110 [nvme]
[  247.201545]  pci_device_remove+0x39/0xc0
[  247.205927]  device_release_driver_internal+0x141/0x200
[  247.211761]  device_release_driver+0x12/0x20
[  247.216531]  pci_stop_bus_device+0x8c/0xa0
[  247.221104]  pci_stop_and_remove_bus_device_locked+0x1a/0x30
[  247.227420]  remove_store+0x7c/0x90
[  247.231320]  dev_attr_store+0x18/0x30
[  247.235409]  sysfs_kf_write+0x3a/0x50
[  247.239497]  kernfs_fop_write+0xff/0x180
[  247.243867]  __vfs_write+0x37/0x160
[  247.247757]  ? selinux_file_permission+0xe5/0x120
[  247.253011]  ? security_file_permission+0x3b/0xc0
[  247.258260]  vfs_write+0xb2/0x1b0
[  247.261964]  ? syscall_trace_enter+0x1d0/0x2b0
[  247.266924]  SyS_write+0x55/0xc0
[  247.270540]  do_syscall_64+0x67/0x150
[  247.274636]  entry_SYSCALL64_slow_path+0x25/0x25
[  247.279794] RIP: 0033:0x7f5c96740840
[  247.283785] RSP: 002b:00007ffd00e87ee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  247.292238] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5c96740840
[  247.300194] RDX: 0000000000000002 RSI: 00007f5c97060000 RDI: 0000000000000001
[  247.308159] RBP: 00007f5c97060000 R08: 000000000000000a R09: 00007f5c97059740
[  247.316123] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f5c96a14400
[  247.324087] R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000000
[  370.016340] INFO: task nvme-test:1772 blocked for more than 120 seconds.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c   |  3 ++-
 block/blk-mq.c         | 28 +++++++++++++++++++++++++---
 include/linux/blkdev.h |  3 +++
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index c4e2afb9d12d..ec9885df324c 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -141,7 +141,8 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	bool did_work = false;
 	LIST_HEAD(rq_list);
 
-	if (unlikely(blk_mq_hctx_stopped(hctx)))
+	/* RCU or SRCU read lock is needed before checking quiesced flag */
+	if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
 		return;
 
 	hctx->run++;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2f0d80bad7be..56ba7e7a8c60 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -170,6 +170,10 @@ void blk_mq_quiesce_queue(struct request_queue *q)
 
 	__blk_mq_stop_hw_queues(q, true);
 
+	spin_lock_irq(q->queue_lock);
+	queue_flag_set(QUEUE_FLAG_QUIESCED, q);
+	spin_unlock_irq(q->queue_lock);
+
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (hctx->flags & BLK_MQ_F_BLOCKING)
 			synchronize_srcu(&hctx->queue_rq_srcu);
@@ -190,7 +194,13 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
  */
 void blk_mq_unquiesce_queue(struct request_queue *q)
 {
+	spin_lock_irq(q->queue_lock);
+	queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
+	spin_unlock_irq(q->queue_lock);
+
 	blk_mq_start_stopped_hw_queues(q, true);
+
+	wake_up_all(&q->quiesce_wq);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
 
@@ -1414,7 +1424,8 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)
 }
 
 static void __blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie,
-				      bool may_sleep)
+					bool may_sleep,
+					unsigned int *srcu_idx)
 {
 	struct request_queue *q = rq->q;
 	struct blk_mq_queue_data bd = {
@@ -1425,6 +1436,15 @@ static void __blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie,
 	blk_qc_t new_cookie;
 	int ret;
 
+	/* wait until queue is unquiesced */
+	wait_event_cmd(q->quiesce_wq, !blk_queue_quiesced(q),
+			may_sleep ?
+			srcu_read_unlock(&hctx->queue_rq_srcu, *srcu_idx) :
+			rcu_read_unlock(),
+			may_sleep ?
+			*srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu) :
+			rcu_read_lock());
+
 	if (q->elevator)
 		goto insert;
 
@@ -1460,7 +1480,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 {
 	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
 		rcu_read_lock();
-		__blk_mq_try_issue_directly(rq, cookie, false);
+		__blk_mq_try_issue_directly(rq, cookie, false, NULL);
 		rcu_read_unlock();
 	} else {
 		unsigned int srcu_idx;
@@ -1468,7 +1488,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 		might_sleep();
 
 		srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
-		__blk_mq_try_issue_directly(rq, cookie, true);
+		__blk_mq_try_issue_directly(rq, cookie, true, &srcu_idx);
 		srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
 	}
 }
@@ -2218,6 +2238,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	/* mark the queue as mq asap */
 	q->mq_ops = set->ops;
 
+	init_waitqueue_head(&q->quiesce_wq);
+
 	q->poll_cb = blk_stat_alloc_callback(blk_mq_poll_stats_fn,
 					     blk_mq_poll_stats_bkt,
 					     BLK_MQ_POLL_STATS_BKTS, q);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 60967797f4f6..253d52d35bd1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -586,6 +586,9 @@ struct request_queue {
 
 	size_t			cmd_size;
 	void			*rq_alloc_data;
+
+	/* for blk_mq_quiesce_queue() */
+	wait_queue_head_t       quiesce_wq;
 };
 
 #define QUEUE_FLAG_QUEUED	1	/* uses generic tag queueing */
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 6/9] blk-mq: update comments on blk_mq_quiesce_queue()
  2017-05-31 12:36 [PATCH v3 0/8] blk-mq: fix & improve queue quiescing Ming Lei
                   ` (4 preceding siblings ...)
  2017-05-31 12:37 ` [PATCH v3 5/9] blk-mq: fix blk_mq_quiesce_queue Ming Lei
@ 2017-05-31 12:37 ` Ming Lei
  2017-05-31 12:37 ` [PATCH v3 7/9] blk-mq: don't stop queue for quiescing Ming Lei
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-05-31 12:37 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

Actually what we want to get from blk_mq_quiesce_queue()
isn't only to wait for completion of all ongoing .queue_rq().

In the typical context of canceling requests, we need to
make sure that the following is done in the dispatch path
before starting to cancel requests:

	- failed dispatched request is finished
	- busy dispatched request is requeued, and the STARTED
	flag is cleared

So update comment to keep code, doc and our expection consistent.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 56ba7e7a8c60..1faddaf005e2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -155,12 +155,13 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
 /**
- * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
+ * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
  * @q: request queue.
  *
  * Note: this function does not prevent that the struct request end_io()
- * callback function is invoked. Additionally, it is not prevented that
- * new queue_rq() calls occur unless the queue has been stopped first.
+ * callback function is invoked. Once this function is returned, we make
+ * sure no dispatch can happen until the queue is unquiesced via
+ * blk_mq_unquiesce_queue().
  */
 void blk_mq_quiesce_queue(struct request_queue *q)
 {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 7/9] blk-mq: don't stop queue for quiescing
  2017-05-31 12:36 [PATCH v3 0/8] blk-mq: fix & improve queue quiescing Ming Lei
                   ` (5 preceding siblings ...)
  2017-05-31 12:37 ` [PATCH v3 6/9] blk-mq: update comments on blk_mq_quiesce_queue() Ming Lei
@ 2017-05-31 12:37 ` Ming Lei
  2017-05-31 12:37 ` [PATCH v3 8/9] blk-mq: clarify dispatch may not be drained/blocked by stopping queue Ming Lei
  2017-05-31 12:37 ` [PATCH v3 9/9] Revert "blk-mq: don't use sync workqueue flushing from drivers" Ming Lei
  8 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-05-31 12:37 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

Queue can be started by other blk-mq APIs and can be used in
different cases, this limits uses of blk_mq_quiesce_queue()
if it is based on stopping queue, and make its usage very
difficult, especially users have to use the stop queue APIs
carefully for avoiding to break blk_mq_quiesce_queue().

We have applied the QUIESCED flag for draining and blocking
dispatch, so it isn't necessary to stop queue any more.

After stopping queue is removed, blk_mq_quiesce_queue() can
be used safely and easily, then users won't worry about queue
restarting during quiescing at all.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1faddaf005e2..bcff1b184bbb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -169,8 +169,6 @@ void blk_mq_quiesce_queue(struct request_queue *q)
 	unsigned int i;
 	bool rcu = false;
 
-	__blk_mq_stop_hw_queues(q, true);
-
 	spin_lock_irq(q->queue_lock);
 	queue_flag_set(QUEUE_FLAG_QUIESCED, q);
 	spin_unlock_irq(q->queue_lock);
@@ -199,8 +197,6 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
 	queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
 	spin_unlock_irq(q->queue_lock);
 
-	blk_mq_start_stopped_hw_queues(q, true);
-
 	wake_up_all(&q->quiesce_wq);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 8/9] blk-mq: clarify dispatch may not be drained/blocked by stopping queue
  2017-05-31 12:36 [PATCH v3 0/8] blk-mq: fix & improve queue quiescing Ming Lei
                   ` (6 preceding siblings ...)
  2017-05-31 12:37 ` [PATCH v3 7/9] blk-mq: don't stop queue for quiescing Ming Lei
@ 2017-05-31 12:37 ` Ming Lei
  2017-05-31 12:37 ` [PATCH v3 9/9] Revert "blk-mq: don't use sync workqueue flushing from drivers" Ming Lei
  8 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-05-31 12:37 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

BLK_MQ_S_STOPPED may not be observed in other concurrent I/O paths,
we can't guarantee that dispatching won't happen after returning
from the APIs of stopping queue.

So clarify the fact and avoid potential misuse.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index bcff1b184bbb..352aaaf0bcf9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1187,6 +1187,11 @@ static void __blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx, bool sync)
 	set_bit(BLK_MQ_S_STOPPED, &hctx->state);
 }
 
+/*
+ * We do not guarantee that dispatch can be drained or blocked
+ * after blk_mq_stop_hw_queue() returns. Please use
+ * blk_mq_quiesce_queue() for that requirement.
+ */
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
 	__blk_mq_stop_hw_queue(hctx, false);
@@ -1202,6 +1207,11 @@ static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync)
 		__blk_mq_stop_hw_queue(hctx, sync);
 }
 
+/*
+ * We do not guarantee that dispatch can be drained or blocked
+ * after blk_mq_stop_hw_queues() returns. Please use
+ * blk_mq_quiesce_queue() for that requirement.
+ */
 void blk_mq_stop_hw_queues(struct request_queue *q)
 {
 	__blk_mq_stop_hw_queues(q, false);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 9/9] Revert "blk-mq: don't use sync workqueue flushing from drivers"
  2017-05-31 12:36 [PATCH v3 0/8] blk-mq: fix & improve queue quiescing Ming Lei
                   ` (7 preceding siblings ...)
  2017-05-31 12:37 ` [PATCH v3 8/9] blk-mq: clarify dispatch may not be drained/blocked by stopping queue Ming Lei
@ 2017-05-31 12:37 ` Ming Lei
  8 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-05-31 12:37 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche, Ming Lei

This patch reverts commit 2719aa217e0d02(blk-mq: don't use
sync workqueue flushing from drivers) because only
blk_mq_quiesce_queue() need the sync flush, and now
we don't need to stop queue any more, so revert it.

Also changes to cancel_delayed_work() in blk_mq_stop_hw_queue().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 30 ++++++++----------------------
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 352aaaf0bcf9..6be10102b343 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -42,7 +42,6 @@ static LIST_HEAD(all_q_list);
 
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
-static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync);
 
 static int blk_mq_poll_stats_bkt(const struct request *rq)
 {
@@ -1177,16 +1176,6 @@ bool blk_mq_queue_stopped(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_mq_queue_stopped);
 
-static void __blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx, bool sync)
-{
-	if (sync)
-		cancel_delayed_work_sync(&hctx->run_work);
-	else
-		cancel_delayed_work(&hctx->run_work);
-
-	set_bit(BLK_MQ_S_STOPPED, &hctx->state);
-}
-
 /*
  * We do not guarantee that dispatch can be drained or blocked
  * after blk_mq_stop_hw_queue() returns. Please use
@@ -1194,18 +1183,11 @@ static void __blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx, bool sync)
  */
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
-	__blk_mq_stop_hw_queue(hctx, false);
-}
-EXPORT_SYMBOL(blk_mq_stop_hw_queue);
+	cancel_delayed_work(&hctx->run_work);
 
-static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync)
-{
-	struct blk_mq_hw_ctx *hctx;
-	int i;
-
-	queue_for_each_hw_ctx(q, hctx, i)
-		__blk_mq_stop_hw_queue(hctx, sync);
+	set_bit(BLK_MQ_S_STOPPED, &hctx->state);
 }
+EXPORT_SYMBOL(blk_mq_stop_hw_queue);
 
 /*
  * We do not guarantee that dispatch can be drained or blocked
@@ -1214,7 +1196,11 @@ static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync)
  */
 void blk_mq_stop_hw_queues(struct request_queue *q)
 {
-	__blk_mq_stop_hw_queues(q, false);
+	struct blk_mq_hw_ctx *hctx;
+	int i;
+
+	queue_for_each_hw_ctx(q, hctx, i)
+		blk_mq_stop_hw_queue(hctx);
 }
 EXPORT_SYMBOL(blk_mq_stop_hw_queues);
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue()
  2017-05-31 12:37   ` Ming Lei
  (?)
@ 2017-05-31 15:21     ` Bart Van Assche
  -1 siblings, 0 replies; 29+ messages in thread
From: Bart Van Assche @ 2017-05-31 15:21 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei; +Cc: dm-devel, linux-nvme, linux-scsi

On Wed, 2017-05-31 at 20:37 +0800, Ming Lei wrote:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 99e16ac479e3..ffcf05765e2b 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -3031,7 +3031,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
>  		return -EINVAL;
>  
>  	if (q->mq_ops) {
> -		blk_mq_start_stopped_hw_queues(q, false);
> +		if (blk_queue_quiesced(q))
> +			blk_mq_unquiesce_queue(q);
> +		else
> +			blk_mq_start_stopped_hw_queues(q, false);
>  	} else {
>  		spin_lock_irqsave(q->queue_lock, flags);
>  		blk_start_queue(q);

As I commented on v2, this change is really wrong. All what's needed here is
a call to blk_mq_unquiesce_queue() and nothing else. Adding a call to
blk_mq_start_stopped_hw_queues() is wrong because it makes it impossible to
use the STOPPED flag in the SCSI core to make the block layer core stop calling
.queue_rq() if a SCSI LLD returns "busy".

Bart.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-05-31 15:21     ` Bart Van Assche
  0 siblings, 0 replies; 29+ messages in thread
From: Bart Van Assche @ 2017-05-31 15:21 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei; +Cc: dm-devel, linux-nvme, linux-scsi

On Wed, 2017-05-31 at 20:37 +0800, Ming Lei wrote:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 99e16ac479e3..ffcf05765e2b 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -3031,7 +3031,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
>  		return -EINVAL;
>  
>  	if (q->mq_ops) {
> -		blk_mq_start_stopped_hw_queues(q, false);
> +		if (blk_queue_quiesced(q))
> +			blk_mq_unquiesce_queue(q);
> +		else
> +			blk_mq_start_stopped_hw_queues(q, false);
>  	} else {
>  		spin_lock_irqsave(q->queue_lock, flags);
>  		blk_start_queue(q);

As I commented on v2, this change is really wrong. All what's needed here is
a call to blk_mq_unquiesce_queue() and nothing else. Adding a call to
blk_mq_start_stopped_hw_queues() is wrong because it makes it impossible to
use the STOPPED flag in the SCSI core to make the block layer core stop calling
.queue_rq() if a SCSI LLD returns "busy".

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-05-31 15:21     ` Bart Van Assche
  0 siblings, 0 replies; 29+ messages in thread
From: Bart Van Assche @ 2017-05-31 15:21 UTC (permalink / raw)


On Wed, 2017-05-31@20:37 +0800, Ming Lei wrote:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 99e16ac479e3..ffcf05765e2b 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -3031,7 +3031,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
>  		return -EINVAL;
>  
>  	if (q->mq_ops) {
> -		blk_mq_start_stopped_hw_queues(q, false);
> +		if (blk_queue_quiesced(q))
> +			blk_mq_unquiesce_queue(q);
> +		else
> +			blk_mq_start_stopped_hw_queues(q, false);
>  	} else {
>  		spin_lock_irqsave(q->queue_lock, flags);
>  		blk_start_queue(q);

As I commented on v2, this change is really wrong. All what's needed here is
a call to blk_mq_unquiesce_queue() and nothing else. Adding a call to
blk_mq_start_stopped_hw_queues() is wrong because it makes it impossible to
use the STOPPED flag in the SCSI core to make the block layer core stop calling
.queue_rq() if a SCSI LLD returns "busy".

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/9] blk-mq: fix blk_mq_quiesce_queue
  2017-05-31 12:37 ` [PATCH v3 5/9] blk-mq: fix blk_mq_quiesce_queue Ming Lei
@ 2017-05-31 15:37   ` Bart Van Assche
  2017-06-01  2:21     ` Ming Lei
  0 siblings, 1 reply; 29+ messages in thread
From: Bart Van Assche @ 2017-05-31 15:37 UTC (permalink / raw)
  To: hch, linux-block, axboe, ming.lei

On Wed, 2017-05-31 at 20:37 +0800, Ming Lei wrote:
>=20
> +	/* wait until queue is unquiesced */
> +	wait_event_cmd(q->quiesce_wq, !blk_queue_quiesced(q),
> +			may_sleep ?
> +			srcu_read_unlock(&hctx->queue_rq_srcu, *srcu_idx) :
> +			rcu_read_unlock(),
> +			may_sleep ?
> +			*srcu_idx =3D srcu_read_lock(&hctx->queue_rq_srcu) :
> +			rcu_read_lock());
> +
>  	if (q->elevator)
>  		goto insert;

What I see is that in this patch a new waitqueue has been introduced
(quiesce_wq) and also that an explanation of why you think this new waitque=
ue
is needed is missing completely. Why is it that you think that the
synchronize_scru() and synchronize_rcu() calls in blk_mq_quiesce_queue() ar=
e
not sufficient? If this new waitqueue is not needed, please remove that
waitqueue again.

Bart.=

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue()
  2017-05-31 15:21     ` Bart Van Assche
@ 2017-06-01  0:54       ` Ming Lei
  -1 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-06-01  0:54 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe, linux-scsi, dm-devel, linux-nvme

On Wed, May 31, 2017 at 03:21:41PM +0000, Bart Van Assche wrote:
> On Wed, 2017-05-31 at 20:37 +0800, Ming Lei wrote:
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 99e16ac479e3..ffcf05765e2b 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -3031,7 +3031,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
> >  		return -EINVAL;
> >  
> >  	if (q->mq_ops) {
> > -		blk_mq_start_stopped_hw_queues(q, false);
> > +		if (blk_queue_quiesced(q))
> > +			blk_mq_unquiesce_queue(q);
> > +		else
> > +			blk_mq_start_stopped_hw_queues(q, false);
> >  	} else {
> >  		spin_lock_irqsave(q->queue_lock, flags);
> >  		blk_start_queue(q);
> 
> As I commented on v2, this change is really wrong. All what's needed here is
> a call to blk_mq_unquiesce_queue() and nothing else. Adding a call to
> blk_mq_start_stopped_hw_queues() is wrong because it makes it impossible to
> use the STOPPED flag in the SCSI core to make the block layer core stop calling
> .queue_rq() if a SCSI LLD returns "busy".

I am not sure if I understand your idea, could you explain a bit why it is wrong?

Let's see the function of scsi_internal_device_block():

	if (q->mq_ops) {
                if (wait)
                        blk_mq_quiesce_queue(q);
                else
                        blk_mq_stop_hw_queues(q);
	}

So the queue may be put into quiesced if 'wait' is true, or it is
stopped if 'wait' is false.

And this patch just makes the two SCSI APIs symmetrical.

Since we will not stop queue in blk_mq_quiesce_queue() later,
I have to unquiese one queue only if it is quiesced.

So suppose the queue is put into stopped in scsi_internal_device_block(),
do we expect not to restart it in scsi_internal_device_unblock() via
blk_mq_unquiesce_queue()?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-06-01  0:54       ` Ming Lei
  0 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-06-01  0:54 UTC (permalink / raw)


On Wed, May 31, 2017@03:21:41PM +0000, Bart Van Assche wrote:
> On Wed, 2017-05-31@20:37 +0800, Ming Lei wrote:
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 99e16ac479e3..ffcf05765e2b 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -3031,7 +3031,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
> >  		return -EINVAL;
> >  
> >  	if (q->mq_ops) {
> > -		blk_mq_start_stopped_hw_queues(q, false);
> > +		if (blk_queue_quiesced(q))
> > +			blk_mq_unquiesce_queue(q);
> > +		else
> > +			blk_mq_start_stopped_hw_queues(q, false);
> >  	} else {
> >  		spin_lock_irqsave(q->queue_lock, flags);
> >  		blk_start_queue(q);
> 
> As I commented on v2, this change is really wrong. All what's needed here is
> a call to blk_mq_unquiesce_queue() and nothing else. Adding a call to
> blk_mq_start_stopped_hw_queues() is wrong because it makes it impossible to
> use the STOPPED flag in the SCSI core to make the block layer core stop calling
> .queue_rq() if a SCSI LLD returns "busy".

I am not sure if I understand your idea, could you explain a bit why it is wrong?

Let's see the function of scsi_internal_device_block():

	if (q->mq_ops) {
                if (wait)
                        blk_mq_quiesce_queue(q);
                else
                        blk_mq_stop_hw_queues(q);
	}

So the queue may be put into quiesced if 'wait' is true, or it is
stopped if 'wait' is false.

And this patch just makes the two SCSI APIs symmetrical.

Since we will not stop queue in blk_mq_quiesce_queue() later,
I have to unquiese one queue only if it is quiesced.

So suppose the queue is put into stopped in scsi_internal_device_block(),
do we expect not to restart it in scsi_internal_device_unblock() via
blk_mq_unquiesce_queue()?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/9] blk-mq: fix blk_mq_quiesce_queue
  2017-05-31 15:37   ` Bart Van Assche
@ 2017-06-01  2:21     ` Ming Lei
  2017-06-01 23:19       ` Bart Van Assche
  0 siblings, 1 reply; 29+ messages in thread
From: Ming Lei @ 2017-06-01  2:21 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe

On Wed, May 31, 2017 at 03:37:30PM +0000, Bart Van Assche wrote:
> On Wed, 2017-05-31 at 20:37 +0800, Ming Lei wrote:
> > 
> > +	/* wait until queue is unquiesced */
> > +	wait_event_cmd(q->quiesce_wq, !blk_queue_quiesced(q),
> > +			may_sleep ?
> > +			srcu_read_unlock(&hctx->queue_rq_srcu, *srcu_idx) :
> > +			rcu_read_unlock(),
> > +			may_sleep ?
> > +			*srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu) :
> > +			rcu_read_lock());
> > +
> >  	if (q->elevator)
> >  		goto insert;
> 
> What I see is that in this patch a new waitqueue has been introduced
> (quiesce_wq) and also that an explanation of why you think this new waitqueue
> is needed is missing completely. Why is it that you think that the
> synchronize_scru() and synchronize_rcu() calls in blk_mq_quiesce_queue() are
> not sufficient? If this new waitqueue is not needed, please remove that
> waitqueue again.

OK, the reason is simple, and it is only related with direct issue.

Under this situation, when the queue is quiesced, we have to

	- insert the current request into sw queue(scheduler queue)
	OR
	-wait until queue becomes unquiesced like what this patch is doing

The disadvantage of the 1st way is that we have to consider to run queue
again in blk_mq_unquiesce_queue() for the queued requests during quiescing.

For the 2nd way(what this patch is doing), one benefit is that application
can avoid to submit I/O to a quiesced queue. Another benefit is that we
needn't to consider to run queue in blk_mq_unquiesce_queue(). But with cost
of one waitqueue, the cost should be cheap, but if you persist on the 1st
approach, I am fine to change to that.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue()
  2017-06-01  0:54       ` Ming Lei
  (?)
@ 2017-06-01 23:09         ` Bart Van Assche
  -1 siblings, 0 replies; 29+ messages in thread
From: Bart Van Assche @ 2017-06-01 23:09 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-scsi, hch, linux-block, linux-nvme, axboe, dm-devel

On Thu, 2017-06-01 at 08:54 +0800, Ming Lei wrote:
> On Wed, May 31, 2017 at 03:21:41PM +0000, Bart Van Assche wrote:
> > On Wed, 2017-05-31 at 20:37 +0800, Ming Lei wrote:
> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index 99e16ac479e3..ffcf05765e2b 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -3031,7 +3031,10 @@ scsi_internal_device_unblock(struct scsi_devic=
e *sdev,
> > >  		return -EINVAL;
> > > =20
> > >  	if (q->mq_ops) {
> > > -		blk_mq_start_stopped_hw_queues(q, false);
> > > +		if (blk_queue_quiesced(q))
> > > +			blk_mq_unquiesce_queue(q);
> > > +		else
> > > +			blk_mq_start_stopped_hw_queues(q, false);
> > >  	} else {
> > >  		spin_lock_irqsave(q->queue_lock, flags);
> > >  		blk_start_queue(q);
> >=20
> > As I commented on v2, this change is really wrong. All what's needed he=
re is
> > a call to blk_mq_unquiesce_queue() and nothing else. Adding a call to
> > blk_mq_start_stopped_hw_queues() is wrong because it makes it impossibl=
e to
> > use the STOPPED flag in the SCSI core to make the block layer core stop=
 calling
> > .queue_rq() if a SCSI LLD returns "busy".
>=20
> I am not sure if I understand your idea, could you explain a bit why it i=
s wrong?
>=20
> Let's see the function of scsi_internal_device_block():
>=20
> 	if (q->mq_ops) {
>                 if (wait)
>                         blk_mq_quiesce_queue(q);
>                 else
>                         blk_mq_stop_hw_queues(q);
> 	}
>=20
> So the queue may be put into quiesced if 'wait' is true, or it is
> stopped if 'wait' is false.
>=20
> And this patch just makes the two SCSI APIs symmetrical.
>=20
> Since we will not stop queue in blk_mq_quiesce_queue() later,
> I have to unquiese one queue only if it is quiesced.
>=20
> So suppose the queue is put into stopped in scsi_internal_device_block(),
> do we expect not to restart it in scsi_internal_device_unblock() via
> blk_mq_unquiesce_queue()?

Hello Ming,

My opinion is that scsi_internal_device_block() and scsi_internal_device_un=
block()
should be changed as follows for the scsi-mq code path:
* scsi_internal_device_block() should call blk_mq_quiesce_queue(q) if the "=
wait"
  argument is true and should set the QUEUE_FLAG_QUIESCED flag if the "wait=
"
  argument is false.
* scsi_internal_device_unblock() should call blk_mq_unquiesce_queue().

I am aware it sounds weird to set the QUEUE_FLAG_QUIESCED flag without wait=
ing
until ongoing .queue_rq() calls have finished. The only driver that trigger=
s
that code path is the mpt3sas driver. I think it's unfortunate that that dr=
iver
has ever been allowed to call scsi_internal_device_block() because it's the=
 only
driver that calls that function from a context where sleeping is not allowe=
d.
No matter whether the scsi_internal_device_block() call from the mpt3sas dr=
iver
sets the QUEUE_FLAG_QUIESCED or the BLK_MQ_S_STOPPED flag, I don't think th=
at
will have the effect the authors of this driver intended. Unfortunately I'm=
 not
familiar enough with the mpt3sas driver to fix that driver myself.

Note: these changes conflict with a patch SCSI patch series I started worki=
ng on
about two months ago. See also https://www.spinics.net/lists/linux-scsi/msg=
109103.html.

Bart.=

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-06-01 23:09         ` Bart Van Assche
  0 siblings, 0 replies; 29+ messages in thread
From: Bart Van Assche @ 2017-06-01 23:09 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-scsi, hch, linux-block, linux-nvme, axboe, dm-devel

On Thu, 2017-06-01 at 08:54 +0800, Ming Lei wrote:
> On Wed, May 31, 2017 at 03:21:41PM +0000, Bart Van Assche wrote:
> > On Wed, 2017-05-31 at 20:37 +0800, Ming Lei wrote:
> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index 99e16ac479e3..ffcf05765e2b 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -3031,7 +3031,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
> > >  		return -EINVAL;
> > >  
> > >  	if (q->mq_ops) {
> > > -		blk_mq_start_stopped_hw_queues(q, false);
> > > +		if (blk_queue_quiesced(q))
> > > +			blk_mq_unquiesce_queue(q);
> > > +		else
> > > +			blk_mq_start_stopped_hw_queues(q, false);
> > >  	} else {
> > >  		spin_lock_irqsave(q->queue_lock, flags);
> > >  		blk_start_queue(q);
> > 
> > As I commented on v2, this change is really wrong. All what's needed here is
> > a call to blk_mq_unquiesce_queue() and nothing else. Adding a call to
> > blk_mq_start_stopped_hw_queues() is wrong because it makes it impossible to
> > use the STOPPED flag in the SCSI core to make the block layer core stop calling
> > .queue_rq() if a SCSI LLD returns "busy".
> 
> I am not sure if I understand your idea, could you explain a bit why it is wrong?
> 
> Let's see the function of scsi_internal_device_block():
> 
> 	if (q->mq_ops) {
>                 if (wait)
>                         blk_mq_quiesce_queue(q);
>                 else
>                         blk_mq_stop_hw_queues(q);
> 	}
> 
> So the queue may be put into quiesced if 'wait' is true, or it is
> stopped if 'wait' is false.
> 
> And this patch just makes the two SCSI APIs symmetrical.
> 
> Since we will not stop queue in blk_mq_quiesce_queue() later,
> I have to unquiese one queue only if it is quiesced.
> 
> So suppose the queue is put into stopped in scsi_internal_device_block(),
> do we expect not to restart it in scsi_internal_device_unblock() via
> blk_mq_unquiesce_queue()?

Hello Ming,

My opinion is that scsi_internal_device_block() and scsi_internal_device_unblock()
should be changed as follows for the scsi-mq code path:
* scsi_internal_device_block() should call blk_mq_quiesce_queue(q) if the "wait"
  argument is true and should set the QUEUE_FLAG_QUIESCED flag if the "wait"
  argument is false.
* scsi_internal_device_unblock() should call blk_mq_unquiesce_queue().

I am aware it sounds weird to set the QUEUE_FLAG_QUIESCED flag without waiting
until ongoing .queue_rq() calls have finished. The only driver that triggers
that code path is the mpt3sas driver. I think it's unfortunate that that driver
has ever been allowed to call scsi_internal_device_block() because it's the only
driver that calls that function from a context where sleeping is not allowed.
No matter whether the scsi_internal_device_block() call from the mpt3sas driver
sets the QUEUE_FLAG_QUIESCED or the BLK_MQ_S_STOPPED flag, I don't think that
will have the effect the authors of this driver intended. Unfortunately I'm not
familiar enough with the mpt3sas driver to fix that driver myself.

Note: these changes conflict with a patch SCSI patch series I started working on
about two months ago. See also https://www.spinics.net/lists/linux-scsi/msg109103.html.

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-06-01 23:09         ` Bart Van Assche
  0 siblings, 0 replies; 29+ messages in thread
From: Bart Van Assche @ 2017-06-01 23:09 UTC (permalink / raw)


On Thu, 2017-06-01@08:54 +0800, Ming Lei wrote:
> On Wed, May 31, 2017@03:21:41PM +0000, Bart Van Assche wrote:
> > On Wed, 2017-05-31@20:37 +0800, Ming Lei wrote:
> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index 99e16ac479e3..ffcf05765e2b 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -3031,7 +3031,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
> > >  		return -EINVAL;
> > >  
> > >  	if (q->mq_ops) {
> > > -		blk_mq_start_stopped_hw_queues(q, false);
> > > +		if (blk_queue_quiesced(q))
> > > +			blk_mq_unquiesce_queue(q);
> > > +		else
> > > +			blk_mq_start_stopped_hw_queues(q, false);
> > >  	} else {
> > >  		spin_lock_irqsave(q->queue_lock, flags);
> > >  		blk_start_queue(q);
> > 
> > As I commented on v2, this change is really wrong. All what's needed here is
> > a call to blk_mq_unquiesce_queue() and nothing else. Adding a call to
> > blk_mq_start_stopped_hw_queues() is wrong because it makes it impossible to
> > use the STOPPED flag in the SCSI core to make the block layer core stop calling
> > .queue_rq() if a SCSI LLD returns "busy".
> 
> I am not sure if I understand your idea, could you explain a bit why it is wrong?
> 
> Let's see the function of scsi_internal_device_block():
> 
> 	if (q->mq_ops) {
>                 if (wait)
>                         blk_mq_quiesce_queue(q);
>                 else
>                         blk_mq_stop_hw_queues(q);
> 	}
> 
> So the queue may be put into quiesced if 'wait' is true, or it is
> stopped if 'wait' is false.
> 
> And this patch just makes the two SCSI APIs symmetrical.
> 
> Since we will not stop queue in blk_mq_quiesce_queue() later,
> I have to unquiese one queue only if it is quiesced.
> 
> So suppose the queue is put into stopped in scsi_internal_device_block(),
> do we expect not to restart it in scsi_internal_device_unblock() via
> blk_mq_unquiesce_queue()?

Hello Ming,

My opinion is that scsi_internal_device_block() and scsi_internal_device_unblock()
should be changed as follows for the scsi-mq code path:
* scsi_internal_device_block() should call blk_mq_quiesce_queue(q) if the "wait"
  argument is true and should set the QUEUE_FLAG_QUIESCED flag if the "wait"
  argument is false.
* scsi_internal_device_unblock() should call blk_mq_unquiesce_queue().

I am aware it sounds weird to set the QUEUE_FLAG_QUIESCED flag without waiting
until ongoing .queue_rq() calls have finished. The only driver that triggers
that code path is the mpt3sas driver. I think it's unfortunate that that driver
has ever been allowed to call scsi_internal_device_block() because it's the only
driver that calls that function from a context where sleeping is not allowed.
No matter whether the scsi_internal_device_block() call from the mpt3sas driver
sets the QUEUE_FLAG_QUIESCED or the BLK_MQ_S_STOPPED flag, I don't think that
will have the effect the authors of this driver intended. Unfortunately I'm not
familiar enough with the mpt3sas driver to fix that driver myself.

Note: these changes conflict with a patch SCSI patch series I started working on
about two months ago. See also https://www.spinics.net/lists/linux-scsi/msg109103.html.

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/9] blk-mq: fix blk_mq_quiesce_queue
  2017-06-01  2:21     ` Ming Lei
@ 2017-06-01 23:19       ` Bart Van Assche
  2017-06-02  2:02         ` Ming Lei
  0 siblings, 1 reply; 29+ messages in thread
From: Bart Van Assche @ 2017-06-01 23:19 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, axboe

On Thu, 2017-06-01 at 10:21 +0800, Ming Lei wrote:
> On Wed, May 31, 2017 at 03:37:30PM +0000, Bart Van Assche wrote:
> > On Wed, 2017-05-31 at 20:37 +0800, Ming Lei wrote:
> > >=20
> > > +	/* wait until queue is unquiesced */
> > > +	wait_event_cmd(q->quiesce_wq, !blk_queue_quiesced(q),
> > > +			may_sleep ?
> > > +			srcu_read_unlock(&hctx->queue_rq_srcu, *srcu_idx) :
> > > +			rcu_read_unlock(),
> > > +			may_sleep ?
> > > +			*srcu_idx =3D srcu_read_lock(&hctx->queue_rq_srcu) :
> > > +			rcu_read_lock());
> > > +
> > >  	if (q->elevator)
> > >  		goto insert;
> >=20
> > What I see is that in this patch a new waitqueue has been introduced
> > (quiesce_wq) and also that an explanation of why you think this new wai=
tqueue
> > is needed is missing completely. Why is it that you think that the
> > synchronize_scru() and synchronize_rcu() calls in blk_mq_quiesce_queue(=
) are
> > not sufficient? If this new waitqueue is not needed, please remove that
> > waitqueue again.
>=20
> OK, the reason is simple, and it is only related with direct issue.
>=20
> Under this situation, when the queue is quiesced, we have to
>=20
> 	- insert the current request into sw queue(scheduler queue)
> 	OR
> 	-wait until queue becomes unquiesced like what this patch is doing
>=20
> The disadvantage of the 1st way is that we have to consider to run queue
> again in blk_mq_unquiesce_queue() for the queued requests during quiescin=
g.
>=20
> For the 2nd way(what this patch is doing), one benefit is that applicatio=
n
> can avoid to submit I/O to a quiesced queue. Another benefit is that we
> needn't to consider to run queue in blk_mq_unquiesce_queue(). But with co=
st
> of one waitqueue, the cost should be cheap, but if you persist on the 1st
> approach, I am fine to change to that.

Hello Ming,

Since the runtime overhead of the alternative approach (insert into queue) =
is
significantly smaller and since it will result in a simpler implementation =
I
prefer the alternative approach.

Thanks,

Bart.=

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue()
  2017-06-01 23:09         ` Bart Van Assche
@ 2017-06-02  2:00           ` Ming Lei
  -1 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-06-02  2:00 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-scsi, hch, linux-block, linux-nvme, axboe, dm-devel

On Thu, Jun 01, 2017 at 11:09:00PM +0000, Bart Van Assche wrote:
> On Thu, 2017-06-01 at 08:54 +0800, Ming Lei wrote:
> > On Wed, May 31, 2017 at 03:21:41PM +0000, Bart Van Assche wrote:
> > > On Wed, 2017-05-31 at 20:37 +0800, Ming Lei wrote:
> > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > > index 99e16ac479e3..ffcf05765e2b 100644
> > > > --- a/drivers/scsi/scsi_lib.c
> > > > +++ b/drivers/scsi/scsi_lib.c
> > > > @@ -3031,7 +3031,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
> > > >  		return -EINVAL;
> > > >  
> > > >  	if (q->mq_ops) {
> > > > -		blk_mq_start_stopped_hw_queues(q, false);
> > > > +		if (blk_queue_quiesced(q))
> > > > +			blk_mq_unquiesce_queue(q);
> > > > +		else
> > > > +			blk_mq_start_stopped_hw_queues(q, false);
> > > >  	} else {
> > > >  		spin_lock_irqsave(q->queue_lock, flags);
> > > >  		blk_start_queue(q);
> > > 
> > > As I commented on v2, this change is really wrong. All what's needed here is
> > > a call to blk_mq_unquiesce_queue() and nothing else. Adding a call to
> > > blk_mq_start_stopped_hw_queues() is wrong because it makes it impossible to
> > > use the STOPPED flag in the SCSI core to make the block layer core stop calling
> > > .queue_rq() if a SCSI LLD returns "busy".
> > 
> > I am not sure if I understand your idea, could you explain a bit why it is wrong?
> > 
> > Let's see the function of scsi_internal_device_block():
> > 
> > 	if (q->mq_ops) {
> >                 if (wait)
> >                         blk_mq_quiesce_queue(q);
> >                 else
> >                         blk_mq_stop_hw_queues(q);
> > 	}
> > 
> > So the queue may be put into quiesced if 'wait' is true, or it is
> > stopped if 'wait' is false.
> > 
> > And this patch just makes the two SCSI APIs symmetrical.
> > 
> > Since we will not stop queue in blk_mq_quiesce_queue() later,
> > I have to unquiese one queue only if it is quiesced.
> > 
> > So suppose the queue is put into stopped in scsi_internal_device_block(),
> > do we expect not to restart it in scsi_internal_device_unblock() via
> > blk_mq_unquiesce_queue()?
> 
> Hello Ming,
> 
> My opinion is that scsi_internal_device_block() and scsi_internal_device_unblock()
> should be changed as follows for the scsi-mq code path:
> * scsi_internal_device_block() should call blk_mq_quiesce_queue(q) if the "wait"
>   argument is true and should set the QUEUE_FLAG_QUIESCED flag if the "wait"
>   argument is false.
> * scsi_internal_device_unblock() should call blk_mq_unquiesce_queue().
> 
> I am aware it sounds weird to set the QUEUE_FLAG_QUIESCED flag without waiting
> until ongoing .queue_rq() calls have finished. The only driver that triggers

This way may break the contract of blk_mq_quiesce_queue() and is really weird.
And the above change shouldn't be related with this patchset, so better
to do whatever you want once this patch is merged.

This patchset won't break current SCSI uses of blk_mq_quiesce_queue() and
won't cause regression, and I setup one iSCSI target yesterday and it works
just fine, how about just keeping this patch as it is?

Once it is merged, you can figure out one perfect solution, but
the contract of blk_mq_quiesce_queue() still shouldn't be broken.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-06-02  2:00           ` Ming Lei
  0 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-06-02  2:00 UTC (permalink / raw)


On Thu, Jun 01, 2017@11:09:00PM +0000, Bart Van Assche wrote:
> On Thu, 2017-06-01@08:54 +0800, Ming Lei wrote:
> > On Wed, May 31, 2017@03:21:41PM +0000, Bart Van Assche wrote:
> > > On Wed, 2017-05-31@20:37 +0800, Ming Lei wrote:
> > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > > index 99e16ac479e3..ffcf05765e2b 100644
> > > > --- a/drivers/scsi/scsi_lib.c
> > > > +++ b/drivers/scsi/scsi_lib.c
> > > > @@ -3031,7 +3031,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
> > > >  		return -EINVAL;
> > > >  
> > > >  	if (q->mq_ops) {
> > > > -		blk_mq_start_stopped_hw_queues(q, false);
> > > > +		if (blk_queue_quiesced(q))
> > > > +			blk_mq_unquiesce_queue(q);
> > > > +		else
> > > > +			blk_mq_start_stopped_hw_queues(q, false);
> > > >  	} else {
> > > >  		spin_lock_irqsave(q->queue_lock, flags);
> > > >  		blk_start_queue(q);
> > > 
> > > As I commented on v2, this change is really wrong. All what's needed here is
> > > a call to blk_mq_unquiesce_queue() and nothing else. Adding a call to
> > > blk_mq_start_stopped_hw_queues() is wrong because it makes it impossible to
> > > use the STOPPED flag in the SCSI core to make the block layer core stop calling
> > > .queue_rq() if a SCSI LLD returns "busy".
> > 
> > I am not sure if I understand your idea, could you explain a bit why it is wrong?
> > 
> > Let's see the function of scsi_internal_device_block():
> > 
> > 	if (q->mq_ops) {
> >                 if (wait)
> >                         blk_mq_quiesce_queue(q);
> >                 else
> >                         blk_mq_stop_hw_queues(q);
> > 	}
> > 
> > So the queue may be put into quiesced if 'wait' is true, or it is
> > stopped if 'wait' is false.
> > 
> > And this patch just makes the two SCSI APIs symmetrical.
> > 
> > Since we will not stop queue in blk_mq_quiesce_queue() later,
> > I have to unquiese one queue only if it is quiesced.
> > 
> > So suppose the queue is put into stopped in scsi_internal_device_block(),
> > do we expect not to restart it in scsi_internal_device_unblock() via
> > blk_mq_unquiesce_queue()?
> 
> Hello Ming,
> 
> My opinion is that scsi_internal_device_block() and scsi_internal_device_unblock()
> should be changed as follows for the scsi-mq code path:
> * scsi_internal_device_block() should call blk_mq_quiesce_queue(q) if the "wait"
>   argument is true and should set the QUEUE_FLAG_QUIESCED flag if the "wait"
>   argument is false.
> * scsi_internal_device_unblock() should call blk_mq_unquiesce_queue().
> 
> I am aware it sounds weird to set the QUEUE_FLAG_QUIESCED flag without waiting
> until ongoing .queue_rq() calls have finished. The only driver that triggers

This way may break the contract of blk_mq_quiesce_queue() and is really weird.
And the above change shouldn't be related with this patchset, so better
to do whatever you want once this patch is merged.

This patchset won't break current SCSI uses of blk_mq_quiesce_queue() and
won't cause regression, and I setup one iSCSI target yesterday and it works
just fine, how about just keeping this patch as it is?

Once it is merged, you can figure out one perfect solution, but
the contract of blk_mq_quiesce_queue() still shouldn't be broken.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/9] blk-mq: fix blk_mq_quiesce_queue
  2017-06-01 23:19       ` Bart Van Assche
@ 2017-06-02  2:02         ` Ming Lei
  0 siblings, 0 replies; 29+ messages in thread
From: Ming Lei @ 2017-06-02  2:02 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, axboe

On Thu, Jun 01, 2017 at 11:19:11PM +0000, Bart Van Assche wrote:
> On Thu, 2017-06-01 at 10:21 +0800, Ming Lei wrote:
> > On Wed, May 31, 2017 at 03:37:30PM +0000, Bart Van Assche wrote:
> > > On Wed, 2017-05-31 at 20:37 +0800, Ming Lei wrote:
> > > > 
> > > > +	/* wait until queue is unquiesced */
> > > > +	wait_event_cmd(q->quiesce_wq, !blk_queue_quiesced(q),
> > > > +			may_sleep ?
> > > > +			srcu_read_unlock(&hctx->queue_rq_srcu, *srcu_idx) :
> > > > +			rcu_read_unlock(),
> > > > +			may_sleep ?
> > > > +			*srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu) :
> > > > +			rcu_read_lock());
> > > > +
> > > >  	if (q->elevator)
> > > >  		goto insert;
> > > 
> > > What I see is that in this patch a new waitqueue has been introduced
> > > (quiesce_wq) and also that an explanation of why you think this new waitqueue
> > > is needed is missing completely. Why is it that you think that the
> > > synchronize_scru() and synchronize_rcu() calls in blk_mq_quiesce_queue() are
> > > not sufficient? If this new waitqueue is not needed, please remove that
> > > waitqueue again.
> > 
> > OK, the reason is simple, and it is only related with direct issue.
> > 
> > Under this situation, when the queue is quiesced, we have to
> > 
> > 	- insert the current request into sw queue(scheduler queue)
> > 	OR
> > 	-wait until queue becomes unquiesced like what this patch is doing
> > 
> > The disadvantage of the 1st way is that we have to consider to run queue
> > again in blk_mq_unquiesce_queue() for the queued requests during quiescing.
> > 
> > For the 2nd way(what this patch is doing), one benefit is that application
> > can avoid to submit I/O to a quiesced queue. Another benefit is that we
> > needn't to consider to run queue in blk_mq_unquiesce_queue(). But with cost
> > of one waitqueue, the cost should be cheap, but if you persist on the 1st
> > approach, I am fine to change to that.
> 
> Hello Ming,
> 
> Since the runtime overhead of the alternative approach (insert into queue) is
> significantly smaller and since it will result in a simpler implementation I
> prefer the alternative approach.

OK, no problem, will change to the way of insert & run queue.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue()
  2017-06-02  2:00           ` Ming Lei
  (?)
@ 2017-06-02 22:19             ` Bart Van Assche
  -1 siblings, 0 replies; 29+ messages in thread
From: Bart Van Assche @ 2017-06-02 22:19 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-scsi, hch, linux-block, linux-nvme, axboe, dm-devel

On Fri, 2017-06-02 at 10:00 +0800, Ming Lei wrote:
> On Thu, Jun 01, 2017 at 11:09:00PM +0000, Bart Van Assche wrote:
> > My opinion is that scsi_internal_device_block() and scsi_internal_devic=
e_unblock()
> > should be changed as follows for the scsi-mq code path:
> > * scsi_internal_device_block() should call blk_mq_quiesce_queue(q) if t=
he "wait"
> >   argument is true and should set the QUEUE_FLAG_QUIESCED flag if the "=
wait"
> >   argument is false.
> > * scsi_internal_device_unblock() should call blk_mq_unquiesce_queue().
> >=20
> > I am aware it sounds weird to set the QUEUE_FLAG_QUIESCED flag without =
waiting
> > until ongoing .queue_rq() calls have finished. The only driver that tri=
ggers
>=20
> This way may break the contract of blk_mq_quiesce_queue() and is really w=
eird.
> And the above change shouldn't be related with this patchset, so better
> to do whatever you want once this patch is merged.

No, what I proposed doesn't break the contract of blk_mq_quiesce_queue().
The contract of that function is that if it is called that all pending
.queue_rq() calls have finished and no new .queue_rq() calls will occur
until blk_mq_unquiesce_queue() is called. Since the wait =3D=3D false path =
does
not call blk_mq_quiesce_queue() that contract does not apply.

> This patchset won't break current SCSI uses of blk_mq_quiesce_queue() and
> won't cause regression, and I setup one iSCSI target yesterday and it wor=
ks
> just fine, how about just keeping this patch as it is?

As I explained in a previous e-mail, the iSCSI initiator driver does not
trigger the wait =3D=3D false path so your test did not trigger that code p=
ath.

Something else I explained in a previous e-mail is that your patch makes it
impossible to use the STOPPED flag in the SCSI core what it was originally
intended for, namely preventing that the block layer keeps trying to queue
requests while the block driver knows that it will have to return "BUSY".
This is why I asked you to modify the SCSI integration of your patch.

Bart.=

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-06-02 22:19             ` Bart Van Assche
  0 siblings, 0 replies; 29+ messages in thread
From: Bart Van Assche @ 2017-06-02 22:19 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-scsi, hch, linux-block, linux-nvme, axboe, dm-devel

On Fri, 2017-06-02 at 10:00 +0800, Ming Lei wrote:
> On Thu, Jun 01, 2017 at 11:09:00PM +0000, Bart Van Assche wrote:
> > My opinion is that scsi_internal_device_block() and scsi_internal_device_unblock()
> > should be changed as follows for the scsi-mq code path:
> > * scsi_internal_device_block() should call blk_mq_quiesce_queue(q) if the "wait"
> >   argument is true and should set the QUEUE_FLAG_QUIESCED flag if the "wait"
> >   argument is false.
> > * scsi_internal_device_unblock() should call blk_mq_unquiesce_queue().
> > 
> > I am aware it sounds weird to set the QUEUE_FLAG_QUIESCED flag without waiting
> > until ongoing .queue_rq() calls have finished. The only driver that triggers
> 
> This way may break the contract of blk_mq_quiesce_queue() and is really weird.
> And the above change shouldn't be related with this patchset, so better
> to do whatever you want once this patch is merged.

No, what I proposed doesn't break the contract of blk_mq_quiesce_queue().
The contract of that function is that if it is called that all pending
.queue_rq() calls have finished and no new .queue_rq() calls will occur
until blk_mq_unquiesce_queue() is called. Since the wait == false path does
not call blk_mq_quiesce_queue() that contract does not apply.

> This patchset won't break current SCSI uses of blk_mq_quiesce_queue() and
> won't cause regression, and I setup one iSCSI target yesterday and it works
> just fine, how about just keeping this patch as it is?

As I explained in a previous e-mail, the iSCSI initiator driver does not
trigger the wait == false path so your test did not trigger that code path.

Something else I explained in a previous e-mail is that your patch makes it
impossible to use the STOPPED flag in the SCSI core what it was originally
intended for, namely preventing that the block layer keeps trying to queue
requests while the block driver knows that it will have to return "BUSY".
This is why I asked you to modify the SCSI integration of your patch.

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue()
@ 2017-06-02 22:19             ` Bart Van Assche
  0 siblings, 0 replies; 29+ messages in thread
From: Bart Van Assche @ 2017-06-02 22:19 UTC (permalink / raw)


On Fri, 2017-06-02@10:00 +0800, Ming Lei wrote:
> On Thu, Jun 01, 2017@11:09:00PM +0000, Bart Van Assche wrote:
> > My opinion is that scsi_internal_device_block() and scsi_internal_device_unblock()
> > should be changed as follows for the scsi-mq code path:
> > * scsi_internal_device_block() should call blk_mq_quiesce_queue(q) if the "wait"
> >   argument is true and should set the QUEUE_FLAG_QUIESCED flag if the "wait"
> >   argument is false.
> > * scsi_internal_device_unblock() should call blk_mq_unquiesce_queue().
> > 
> > I am aware it sounds weird to set the QUEUE_FLAG_QUIESCED flag without waiting
> > until ongoing .queue_rq() calls have finished. The only driver that triggers
> 
> This way may break the contract of blk_mq_quiesce_queue() and is really weird.
> And the above change shouldn't be related with this patchset, so better
> to do whatever you want once this patch is merged.

No, what I proposed doesn't break the contract of blk_mq_quiesce_queue().
The contract of that function is that if it is called that all pending
.queue_rq() calls have finished and no new .queue_rq() calls will occur
until blk_mq_unquiesce_queue() is called. Since the wait == false path does
not call blk_mq_quiesce_queue() that contract does not apply.

> This patchset won't break current SCSI uses of blk_mq_quiesce_queue() and
> won't cause regression, and I setup one iSCSI target yesterday and it works
> just fine, how about just keeping this patch as it is?

As I explained in a previous e-mail, the iSCSI initiator driver does not
trigger the wait == false path so your test did not trigger that code path.

Something else I explained in a previous e-mail is that your patch makes it
impossible to use the STOPPED flag in the SCSI core what it was originally
intended for, namely preventing that the block layer keeps trying to queue
requests while the block driver knows that it will have to return "BUSY".
This is why I asked you to modify the SCSI integration of your patch.

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-06-02 22:36 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-31 12:36 [PATCH v3 0/8] blk-mq: fix & improve queue quiescing Ming Lei
2017-05-31 12:36 ` [PATCH v3 1/9] blk-mq: introduce blk_mq_unquiesce_queue Ming Lei
2017-05-31 12:36 ` [PATCH v3 2/9] block: introduce flag of QUEUE_FLAG_QUIESCED Ming Lei
2017-05-31 12:37 ` [PATCH v3 3/9] blk-mq: use the introduced blk_mq_unquiesce_queue() Ming Lei
2017-05-31 12:37   ` Ming Lei
2017-05-31 15:21   ` Bart Van Assche
2017-05-31 15:21     ` Bart Van Assche
2017-05-31 15:21     ` Bart Van Assche
2017-06-01  0:54     ` Ming Lei
2017-06-01  0:54       ` Ming Lei
2017-06-01 23:09       ` Bart Van Assche
2017-06-01 23:09         ` Bart Van Assche
2017-06-01 23:09         ` Bart Van Assche
2017-06-02  2:00         ` Ming Lei
2017-06-02  2:00           ` Ming Lei
2017-06-02 22:19           ` Bart Van Assche
2017-06-02 22:19             ` Bart Van Assche
2017-06-02 22:19             ` Bart Van Assche
2017-05-31 12:37 ` [PATCH v3 4/9] nvme: host: unquiesce queue in nvme_kill_queues() Ming Lei
2017-05-31 12:37   ` Ming Lei
2017-05-31 12:37 ` [PATCH v3 5/9] blk-mq: fix blk_mq_quiesce_queue Ming Lei
2017-05-31 15:37   ` Bart Van Assche
2017-06-01  2:21     ` Ming Lei
2017-06-01 23:19       ` Bart Van Assche
2017-06-02  2:02         ` Ming Lei
2017-05-31 12:37 ` [PATCH v3 6/9] blk-mq: update comments on blk_mq_quiesce_queue() Ming Lei
2017-05-31 12:37 ` [PATCH v3 7/9] blk-mq: don't stop queue for quiescing Ming Lei
2017-05-31 12:37 ` [PATCH v3 8/9] blk-mq: clarify dispatch may not be drained/blocked by stopping queue Ming Lei
2017-05-31 12:37 ` [PATCH v3 9/9] Revert "blk-mq: don't use sync workqueue flushing from drivers" Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.