All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI
@ 2017-09-25 20:29 Bart Van Assche
  2017-09-25 20:29 ` [PATCH v4 1/7] md: Make md resync and reshape threads freezable Bart Van Assche
                   ` (7 more replies)
  0 siblings, 8 replies; 48+ messages in thread
From: Bart Van Assche @ 2017-09-25 20:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Bart Van Assche

Hello Jens,

It is known that during the resume following a hibernate, especially when
using an md RAID1 array created on top of SCSI devices, sometimes the
system hangs instead of coming up properly. This patch series fixes this
problem. This patch series is an alternative for Ming Lei's "[PATCH V5
0/10] block/scsi: safe SCSI quiescing" patch series. The advantages of
this patch series are:
- Easier to review because no new race conditions are introduced between
  queue freezing and blk_cleanup_queue(). As the discussion that followed
  Ming's patch series shows the correctness of the new code is hard to
  verify.
- No new freeze modes and hence no new freeze mode variables.

These patches have been tested on top of a merge of the block layer
for-next branch and Linus' master tree. Linus' master tree includes
patch "KVM: x86: Fix the NULL pointer parameter in check_cr_write()"
but the block layer for-next branch not yet.

Please consider these changes for kernel v4.15.

Thanks,

Bart.

Changes between v3 and v4:
- Made sure that this patch series not only works for scsi-mq but also for
  the legacy SCSI stack.
- Removed an smp_rmb()/smp_wmb() pair from the hot path and added a comment
  that explains why that is safe.
- Reordered the patches in this patch series.

Changes between v2 and v3:
- Made md kernel threads freezable.
- Changed the approach for quiescing SCSI devices again.
- Addressed Ming's review comments.

Changes compared to v1 of this patch series:
- Changed the approach and rewrote the patch series.

Bart Van Assche (6):
  md: Make md resync and reshape threads freezable
  block: Convert RQF_PREEMPT into REQ_PREEMPT
  block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  scsi-mq: Reduce suspend latency
  scsi: Set QUEUE_FLAG_PREEMPT_ONLY while quiesced
  block: Make SCSI device suspend and resume work reliably

Ming Lei (1):
  block: Make q_usage_counter also track legacy requests

 block/blk-core.c          | 60 +++++++++++++++++++++++++++++++++++++++++------
 block/blk-mq-debugfs.c    |  2 +-
 block/blk-mq.c            | 14 ++++-------
 block/blk-timeout.c       |  2 +-
 drivers/ide/ide-atapi.c   |  3 +--
 drivers/ide/ide-io.c      |  2 +-
 drivers/ide/ide-pm.c      |  4 ++--
 drivers/md/md.c           | 21 ++++++++++-------
 drivers/scsi/scsi_lib.c   | 30 +++++++++++++++---------
 fs/block_dev.c            |  4 ++--
 include/linux/blk_types.h |  6 +++++
 include/linux/blkdev.h    | 10 ++++----
 12 files changed, 109 insertions(+), 49 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 1/7] md: Make md resync and reshape threads freezable
  2017-09-25 20:29 [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI Bart Van Assche
@ 2017-09-25 20:29 ` Bart Van Assche
  2017-09-25 23:04   ` Ming Lei
                     ` (3 more replies)
  2017-09-25 20:29 ` [PATCH v4 2/7] block: Make q_usage_counter also track legacy requests Bart Van Assche
                   ` (6 subsequent siblings)
  7 siblings, 4 replies; 48+ messages in thread
From: Bart Van Assche @ 2017-09-25 20:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Bart Van Assche, Shaohua Li, linux-raid,
	Ming Lei, Hannes Reinecke, Johannes Thumshirn

Some people use the md driver on laptops and use the suspend and
resume functionality. Since it is essential that submitting of
new I/O requests stops before device quiescing starts, make the
md resync and reshape threads freezable.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/md/md.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 08fcaebc61bd..26a12bd0db65 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -66,6 +66,7 @@
 #include <linux/raid/md_u.h>
 #include <linux/slab.h>
 #include <linux/percpu-refcount.h>
+#include <linux/freezer.h>
 
 #include <trace/events/block.h>
 #include "md.h"
@@ -7424,6 +7425,7 @@ static int md_thread(void *arg)
 	 */
 
 	allow_signal(SIGKILL);
+	set_freezable();
 	while (!kthread_should_stop()) {
 
 		/* We need to wait INTERRUPTIBLE so that
@@ -7434,7 +7436,7 @@ static int md_thread(void *arg)
 		if (signal_pending(current))
 			flush_signals(current);
 
-		wait_event_interruptible_timeout
+		wait_event_freezable_timeout
 			(thread->wqueue,
 			 test_bit(THREAD_WAKEUP, &thread->flags)
 			 || kthread_should_stop() || kthread_should_park(),
@@ -8133,6 +8135,8 @@ void md_do_sync(struct md_thread *thread)
 		return;
 	}
 
+	set_freezable();
+
 	if (mddev_is_clustered(mddev)) {
 		ret = md_cluster_ops->resync_start(mddev);
 		if (ret)
@@ -8324,7 +8328,7 @@ void md_do_sync(struct md_thread *thread)
 		     mddev->curr_resync_completed > mddev->resync_max
 			    )) {
 			/* time to update curr_resync_completed */
-			wait_event(mddev->recovery_wait,
+			wait_event_freezable(mddev->recovery_wait,
 				   atomic_read(&mddev->recovery_active) == 0);
 			mddev->curr_resync_completed = j;
 			if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) &&
@@ -8342,10 +8346,10 @@ void md_do_sync(struct md_thread *thread)
 			 * to avoid triggering warnings.
 			 */
 			flush_signals(current); /* just in case */
-			wait_event_interruptible(mddev->recovery_wait,
-						 mddev->resync_max > j
-						 || test_bit(MD_RECOVERY_INTR,
-							     &mddev->recovery));
+			wait_event_freezable(mddev->recovery_wait,
+					     mddev->resync_max > j ||
+					     test_bit(MD_RECOVERY_INTR,
+						      &mddev->recovery));
 		}
 
 		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
@@ -8421,7 +8425,7 @@ void md_do_sync(struct md_thread *thread)
 				 * Give other IO more of a chance.
 				 * The faster the devices, the less we wait.
 				 */
-				wait_event(mddev->recovery_wait,
+				wait_event_freezable(mddev->recovery_wait,
 					   !atomic_read(&mddev->recovery_active));
 			}
 		}
@@ -8433,7 +8437,8 @@ void md_do_sync(struct md_thread *thread)
 	 * this also signals 'finished resyncing' to md_stop
 	 */
 	blk_finish_plug(&plug);
-	wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));
+	wait_event_freezable(mddev->recovery_wait,
+			     !atomic_read(&mddev->recovery_active));
 
 	if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
 	    !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 2/7] block: Make q_usage_counter also track legacy requests
  2017-09-25 20:29 [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI Bart Van Assche
  2017-09-25 20:29 ` [PATCH v4 1/7] md: Make md resync and reshape threads freezable Bart Van Assche
@ 2017-09-25 20:29 ` Bart Van Assche
  2017-09-26  6:05   ` Hannes Reinecke
                     ` (2 more replies)
  2017-09-25 20:29 ` [PATCH v4 3/7] block: Convert RQF_PREEMPT into REQ_PREEMPT Bart Van Assche
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 48+ messages in thread
From: Bart Van Assche @ 2017-09-25 20:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Bart Van Assche, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn

From: Ming Lei <ming.lei@redhat.com>

This patch makes it possible to pause request allocation for
the legacy block layer by calling blk_mq_freeze_queue() and
blk_mq_unfreeze_queue().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
[ bvanassche: Combined two patches into one, edited a comment and made sure
  REQ_NOWAIT is handled properly in blk_old_get_request() ]
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-core.c | 12 ++++++++++++
 block/blk-mq.c   | 10 ++--------
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index aebe676225e6..fef7133f8b0e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -610,6 +610,9 @@ void blk_set_queue_dying(struct request_queue *q)
 		}
 		spin_unlock_irq(q->queue_lock);
 	}
+
+	/* Make blk_queue_enter() reexamine the DYING flag. */
+	wake_up_all(&q->mq_freeze_wq);
 }
 EXPORT_SYMBOL_GPL(blk_set_queue_dying);
 
@@ -1392,16 +1395,22 @@ static struct request *blk_old_get_request(struct request_queue *q,
 					   unsigned int op, gfp_t gfp_mask)
 {
 	struct request *rq;
+	int ret = 0;
 
 	WARN_ON_ONCE(q->mq_ops);
 
 	/* create ioc upfront */
 	create_io_context(gfp_mask, q->node);
 
+	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ||
+			      (op & REQ_NOWAIT));
+	if (ret)
+		return ERR_PTR(ret);
 	spin_lock_irq(q->queue_lock);
 	rq = get_request(q, op, NULL, gfp_mask);
 	if (IS_ERR(rq)) {
 		spin_unlock_irq(q->queue_lock);
+		blk_queue_exit(q);
 		return rq;
 	}
 
@@ -1573,6 +1582,7 @@ void __blk_put_request(struct request_queue *q, struct request *req)
 		blk_free_request(rl, req);
 		freed_request(rl, sync, rq_flags);
 		blk_put_rl(rl);
+		blk_queue_exit(q);
 	}
 }
 EXPORT_SYMBOL_GPL(__blk_put_request);
@@ -1854,8 +1864,10 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
 	 * Grab a free request. This is might sleep but can not fail.
 	 * Returns with the queue unlocked.
 	 */
+	blk_queue_enter_live(q);
 	req = get_request(q, bio->bi_opf, bio, GFP_NOIO);
 	if (IS_ERR(req)) {
+		blk_queue_exit(q);
 		__wbt_done(q->rq_wb, wb_acct);
 		if (PTR_ERR(req) == -ENOMEM)
 			bio->bi_status = BLK_STS_RESOURCE;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 98a18609755e..10c1f49f663d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -125,7 +125,8 @@ void blk_freeze_queue_start(struct request_queue *q)
 	freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
 	if (freeze_depth == 1) {
 		percpu_ref_kill(&q->q_usage_counter);
-		blk_mq_run_hw_queues(q, false);
+		if (q->mq_ops)
+			blk_mq_run_hw_queues(q, false);
 	}
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
@@ -255,13 +256,6 @@ void blk_mq_wake_waiters(struct request_queue *q)
 	queue_for_each_hw_ctx(q, hctx, i)
 		if (blk_mq_hw_queue_mapped(hctx))
 			blk_mq_tag_wakeup_all(hctx->tags, true);
-
-	/*
-	 * If we are called because the queue has now been marked as
-	 * dying, we need to ensure that processes currently waiting on
-	 * the queue are notified as well.
-	 */
-	wake_up_all(&q->mq_freeze_wq);
 }
 
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 3/7] block: Convert RQF_PREEMPT into REQ_PREEMPT
  2017-09-25 20:29 [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI Bart Van Assche
  2017-09-25 20:29 ` [PATCH v4 1/7] md: Make md resync and reshape threads freezable Bart Van Assche
  2017-09-25 20:29 ` [PATCH v4 2/7] block: Make q_usage_counter also track legacy requests Bart Van Assche
@ 2017-09-25 20:29 ` Bart Van Assche
  2017-10-02 13:42   ` Christoph Hellwig
  2017-09-25 20:29 ` [PATCH v4 4/7] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag Bart Van Assche
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 48+ messages in thread
From: Bart Van Assche @ 2017-09-25 20:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Bart Van Assche, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn

This patch does not change any functionality but makes the
REQ_PREEMPT flag available to blk_get_request(). A later patch
will add code to blk_get_request() that checks the REQ_PREEMPT
flag. Note: the IDE sense_rq request is allocated statically so
there is no blk_get_request() call that corresponds to this
request.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-mq-debugfs.c    |  2 +-
 drivers/ide/ide-atapi.c   |  3 +--
 drivers/ide/ide-io.c      |  2 +-
 drivers/ide/ide-pm.c      |  4 ++--
 drivers/scsi/scsi_lib.c   | 11 ++++++-----
 include/linux/blk_types.h |  6 ++++++
 include/linux/blkdev.h    |  3 ---
 7 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 980e73095643..62ac248a4984 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -266,6 +266,7 @@ static const char *const cmd_flag_name[] = {
 	CMD_FLAG_NAME(BACKGROUND),
 	CMD_FLAG_NAME(NOUNMAP),
 	CMD_FLAG_NAME(NOWAIT),
+	CMD_FLAG_NAME(PREEMPT),
 };
 #undef CMD_FLAG_NAME
 
@@ -279,7 +280,6 @@ static const char *const rqf_name[] = {
 	RQF_NAME(MIXED_MERGE),
 	RQF_NAME(MQ_INFLIGHT),
 	RQF_NAME(DONTPREP),
-	RQF_NAME(PREEMPT),
 	RQF_NAME(COPY_USER),
 	RQF_NAME(FAILED),
 	RQF_NAME(QUIET),
diff --git a/drivers/ide/ide-atapi.c b/drivers/ide/ide-atapi.c
index 14d1e7d9a1d6..1258739d5fa1 100644
--- a/drivers/ide/ide-atapi.c
+++ b/drivers/ide/ide-atapi.c
@@ -211,9 +211,8 @@ void ide_prep_sense(ide_drive_t *drive, struct request *rq)
 	}
 
 	sense_rq->rq_disk = rq->rq_disk;
-	sense_rq->cmd_flags = REQ_OP_DRV_IN;
+	sense_rq->cmd_flags = REQ_OP_DRV_IN | REQ_PREEMPT;
 	ide_req(sense_rq)->type = ATA_PRIV_SENSE;
-	sense_rq->rq_flags |= RQF_PREEMPT;
 
 	req->cmd[0] = GPCMD_REQUEST_SENSE;
 	req->cmd[4] = cmd_len;
diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index 3a234701d92c..06ffccd0fb10 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -539,7 +539,7 @@ void do_ide_request(struct request_queue *q)
 		 */
 		if ((drive->dev_flags & IDE_DFLAG_BLOCKED) &&
 		    ata_pm_request(rq) == 0 &&
-		    (rq->rq_flags & RQF_PREEMPT) == 0) {
+		    (rq->cmd_flags & REQ_PREEMPT) == 0) {
 			/* there should be no pending command at this point */
 			ide_unlock_port(hwif);
 			goto plug_device;
diff --git a/drivers/ide/ide-pm.c b/drivers/ide/ide-pm.c
index 544f02d673ca..f8d2709dcd56 100644
--- a/drivers/ide/ide-pm.c
+++ b/drivers/ide/ide-pm.c
@@ -89,9 +89,9 @@ int generic_ide_resume(struct device *dev)
 	}
 
 	memset(&rqpm, 0, sizeof(rqpm));
-	rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
+	rq = blk_get_request(drive->queue, REQ_OP_DRV_IN | REQ_PREEMPT,
+			     __GFP_RECLAIM);
 	ide_req(rq)->type = ATA_PRIV_PM_RESUME;
-	rq->rq_flags |= RQF_PREEMPT;
 	rq->special = &rqpm;
 	rqpm.pm_step = IDE_PM_START_RESUME;
 	rqpm.pm_state = PM_EVENT_ON;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9cf6a80fe297..62f905b22821 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -253,8 +253,9 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	int ret = DRIVER_ERROR << 24;
 
 	req = blk_get_request(sdev->request_queue,
-			data_direction == DMA_TO_DEVICE ?
-			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM);
+			      (data_direction == DMA_TO_DEVICE ?
+			       REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN) | REQ_PREEMPT,
+			      __GFP_RECLAIM);
 	if (IS_ERR(req))
 		return ret;
 	rq = scsi_req(req);
@@ -268,7 +269,7 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	rq->retries = retries;
 	req->timeout = timeout;
 	req->cmd_flags |= flags;
-	req->rq_flags |= rq_flags | RQF_QUIET | RQF_PREEMPT;
+	req->rq_flags |= rq_flags | RQF_QUIET;
 
 	/*
 	 * head injection *required* here otherwise quiesce won't work
@@ -1301,7 +1302,7 @@ scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
 			/*
 			 * If the devices is blocked we defer normal commands.
 			 */
-			if (!(req->rq_flags & RQF_PREEMPT))
+			if (!(req->cmd_flags & REQ_PREEMPT))
 				ret = BLKPREP_DEFER;
 			break;
 		default:
@@ -1310,7 +1311,7 @@ scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
 			 * special commands.  In particular any user initiated
 			 * command is not allowed.
 			 */
-			if (!(req->rq_flags & RQF_PREEMPT))
+			if (!(req->cmd_flags & REQ_PREEMPT))
 				ret = BLKPREP_KILL;
 			break;
 		}
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index a2d2aa709cef..b6b6bf44e462 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -229,6 +229,11 @@ enum req_flag_bits {
 	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
 
 	__REQ_NOWAIT,           /* Don't wait if request will block */
+	/*
+	 * set for "ide_preempt" requests and also for requests for which the
+	 * SCSI "quiesce" state must be ignored.
+	 */
+	__REQ_PREEMPT,
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -248,6 +253,7 @@ enum req_flag_bits {
 
 #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
 #define REQ_NOWAIT		(1ULL << __REQ_NOWAIT)
+#define REQ_PREEMPT		(1ULL << __REQ_PREEMPT)
 
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 460294bb0fa5..cd26901a6e25 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -96,9 +96,6 @@ typedef __u32 __bitwise req_flags_t;
 #define RQF_MQ_INFLIGHT		((__force req_flags_t)(1 << 6))
 /* don't call prep for this one */
 #define RQF_DONTPREP		((__force req_flags_t)(1 << 7))
-/* set for "ide_preempt" requests and also for requests for which the SCSI
-   "quiesce" state must be ignored. */
-#define RQF_PREEMPT		((__force req_flags_t)(1 << 8))
 /* contains copies of user pages */
 #define RQF_COPY_USER		((__force req_flags_t)(1 << 9))
 /* vaguely specified driver internal error.  Ignored by the block layer */
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 4/7] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  2017-09-25 20:29 [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI Bart Van Assche
                   ` (2 preceding siblings ...)
  2017-09-25 20:29 ` [PATCH v4 3/7] block: Convert RQF_PREEMPT into REQ_PREEMPT Bart Van Assche
@ 2017-09-25 20:29 ` Bart Van Assche
  2017-10-02 13:47   ` Christoph Hellwig
  2017-09-25 20:29 ` [PATCH v4 5/7] scsi: Reduce suspend latency Bart Van Assche
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 48+ messages in thread
From: Bart Van Assche @ 2017-09-25 20:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Bart Van Assche, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn

This flag will be used in the next patch to let the block layer
core know whether or not a SCSI request queue has been quiesced.
A quiesced SCSI queue namely only processes RQF_PREEMPT requests.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-core.c       | 13 +++++++++++++
 include/linux/blkdev.h |  5 +++++
 2 files changed, 18 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index fef7133f8b0e..9111a8f9c7a1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -346,6 +346,19 @@ void blk_sync_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_sync_queue);
 
+void blk_set_preempt_only(struct request_queue *q, bool preempt_only)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	if (preempt_only)
+		queue_flag_set(QUEUE_FLAG_PREEMPT_ONLY, q);
+	else
+		queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
+	spin_unlock_irqrestore(q->queue_lock, flags);
+}
+EXPORT_SYMBOL(blk_set_preempt_only);
+
 /**
  * __blk_run_queue_uncond - run a queue whether or not it has been stopped
  * @q:	The queue to run
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index cd26901a6e25..5bd87599eed0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -627,6 +627,7 @@ struct request_queue {
 #define QUEUE_FLAG_REGISTERED  26	/* queue has been registered to a disk */
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
+#define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process REQ_PREEMPT requests */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -731,6 +732,10 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
 			     REQ_FAILFAST_DRIVER))
 #define blk_queue_quiesced(q)	test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
+#define blk_queue_preempt_only(q)				\
+	test_bit(QUEUE_FLAG_PREEMPT_ONLY, &(q)->queue_flags)
+
+extern void blk_set_preempt_only(struct request_queue *q, bool preempt_only);
 
 static inline bool blk_account_rq(struct request *rq)
 {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 5/7] scsi: Reduce suspend latency
  2017-09-25 20:29 [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI Bart Van Assche
                   ` (3 preceding siblings ...)
  2017-09-25 20:29 ` [PATCH v4 4/7] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag Bart Van Assche
@ 2017-09-25 20:29 ` Bart Van Assche
  2017-09-26 22:23   ` Ming Lei
  2017-09-25 20:29 ` [PATCH v4 6/7] scsi: Set QUEUE_FLAG_PREEMPT_ONLY while quiesced Bart Van Assche
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 48+ messages in thread
From: Bart Van Assche @ 2017-09-25 20:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Bart Van Assche, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn

Avoid that it can take 200 ms too long to wait for requests to
finish. Note: blk_mq_freeze_queue() uses a wait queue to wait
for ongoing requests to finish.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/scsi/scsi_lib.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 62f905b22821..c261498606c4 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2927,6 +2927,7 @@ static void scsi_wait_for_queuecommand(struct scsi_device *sdev)
 int
 scsi_device_quiesce(struct scsi_device *sdev)
 {
+	struct request_queue *q = sdev->request_queue;
 	int err;
 
 	mutex_lock(&sdev->state_mutex);
@@ -2936,11 +2937,8 @@ scsi_device_quiesce(struct scsi_device *sdev)
 	if (err)
 		return err;
 
-	scsi_run_queue(sdev->request_queue);
-	while (atomic_read(&sdev->device_busy)) {
-		msleep_interruptible(200);
-		scsi_run_queue(sdev->request_queue);
-	}
+	blk_mq_freeze_queue(q);
+	blk_mq_unfreeze_queue(q);
 	return 0;
 }
 EXPORT_SYMBOL(scsi_device_quiesce);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 6/7] scsi: Set QUEUE_FLAG_PREEMPT_ONLY while quiesced
  2017-09-25 20:29 [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI Bart Van Assche
                   ` (4 preceding siblings ...)
  2017-09-25 20:29 ` [PATCH v4 5/7] scsi: Reduce suspend latency Bart Van Assche
@ 2017-09-25 20:29 ` Bart Van Assche
  2017-10-02 13:53   ` Christoph Hellwig
  2017-09-25 20:29 ` [PATCH v4 7/7] block: Make SCSI device suspend and resume work reliably Bart Van Assche
  2017-09-26  9:15 ` [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI Ming Lei
  7 siblings, 1 reply; 48+ messages in thread
From: Bart Van Assche @ 2017-09-25 20:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Bart Van Assche, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn

Make the quiesce state visible to the block layer for the next
patch in this series.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/scsi/scsi_lib.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c261498606c4..309369d8fe01 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2932,6 +2932,8 @@ scsi_device_quiesce(struct scsi_device *sdev)
 
 	mutex_lock(&sdev->state_mutex);
 	err = scsi_device_set_state(sdev, SDEV_QUIESCE);
+	if (err == 0)
+		blk_set_preempt_only(q, true);
 	mutex_unlock(&sdev->state_mutex);
 
 	if (err)
@@ -2960,8 +2962,10 @@ void scsi_device_resume(struct scsi_device *sdev)
 	 */
 	mutex_lock(&sdev->state_mutex);
 	if (sdev->sdev_state == SDEV_QUIESCE &&
-	    scsi_device_set_state(sdev, SDEV_RUNNING) == 0)
+	    scsi_device_set_state(sdev, SDEV_RUNNING) == 0) {
+		blk_set_preempt_only(sdev->request_queue, false);
 		scsi_run_queue(sdev->request_queue);
+	}
 	mutex_unlock(&sdev->state_mutex);
 }
 EXPORT_SYMBOL(scsi_device_resume);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 7/7] block: Make SCSI device suspend and resume work reliably
  2017-09-25 20:29 [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI Bart Van Assche
                   ` (5 preceding siblings ...)
  2017-09-25 20:29 ` [PATCH v4 6/7] scsi: Set QUEUE_FLAG_PREEMPT_ONLY while quiesced Bart Van Assche
@ 2017-09-25 20:29 ` Bart Van Assche
  2017-09-25 22:59   ` Ming Lei
  2017-09-26  9:15 ` [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI Ming Lei
  7 siblings, 1 reply; 48+ messages in thread
From: Bart Van Assche @ 2017-09-25 20:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Bart Van Assche, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn

It is essential during suspend and resume that neither the filesystem
state nor the filesystem metadata in RAM changes. This is why while
the hibernation image is being written or restored that SCSI devices
are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
and scsi_device_resume(). In the SDEV_QUIESCE state execution of
non-preempt requests is deferred. This is realized by returning
BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
devices. Avoid that a full queue prevents power management requests
to be submitted by deferring allocation of non-preempt requests for
devices in the quiesced state. This patch has been tested by running
the following commands and by verifying that after resume the fio job
is still running:

for d in /sys/class/block/sd*[a-z]; do
  hcil=$(readlink "$d/device")
  hcil=${hcil#../../../}
  echo 4 > "$d/queue/nr_requests"
  echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
done
bdev=$(readlink /dev/disk/by-uuid/5217d83f-213e-4b42-b86e-20013325ba6c)
bdev=${bdev#../../}
hcil=$(readlink "/sys/block/$bdev/device")
hcil=${hcil#../../../}
fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 --rw=randread \
  --ioengine=libaio --numjobs=4 --iodepth=16 --iodepth_batch=1 --thread \
  --loops=$((2**31)) &
pid=$!
sleep 1
systemctl hibernate
sleep 10
kill $pid

Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348).
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-core.c       | 41 +++++++++++++++++++++++++++++++----------
 block/blk-mq.c         |  4 ++--
 block/blk-timeout.c    |  2 +-
 fs/block_dev.c         |  4 ++--
 include/linux/blkdev.h |  2 +-
 5 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 9111a8f9c7a1..01b7afee58f0 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -351,10 +351,12 @@ void blk_set_preempt_only(struct request_queue *q, bool preempt_only)
 	unsigned long flags;
 
 	spin_lock_irqsave(q->queue_lock, flags);
-	if (preempt_only)
+	if (preempt_only) {
 		queue_flag_set(QUEUE_FLAG_PREEMPT_ONLY, q);
-	else
+	} else {
 		queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
+		wake_up_all(&q->mq_freeze_wq);
+	}
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 EXPORT_SYMBOL(blk_set_preempt_only);
@@ -776,13 +778,31 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(blk_alloc_queue);
 
-int blk_queue_enter(struct request_queue *q, bool nowait)
+/**
+ * blk_queue_enter() - try to increase q->q_usage_counter
+ * @q: request queue pointer
+ * @nowait: if the queue is frozen, do not wait until it is unfrozen
+ * @preempt: if QUEUE_FLAG_PREEMPT_ONLY has been set, do not wait until that
+ *	flag has been cleared
+ */
+int blk_queue_enter(struct request_queue *q, bool nowait, bool preempt)
 {
 	while (true) {
 		int ret;
 
-		if (percpu_ref_tryget_live(&q->q_usage_counter))
-			return 0;
+		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
+			/*
+			 * Since setting the PREEMPT_ONLY flag is followed
+			 * by a switch of q_usage_counter from per-cpu to
+			 * atomic mode and back to per-cpu and since the
+			 * switch to atomic mode uses call_rcu_sched(), it
+			 * is not necessary to call smp_rmb() here.
+			 */
+			if (preempt || !blk_queue_preempt_only(q))
+				return 0;
+			else
+				percpu_ref_put(&q->q_usage_counter);
+		}
 
 		if (nowait)
 			return -EBUSY;
@@ -797,7 +817,8 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
 		smp_rmb();
 
 		ret = wait_event_interruptible(q->mq_freeze_wq,
-				!atomic_read(&q->mq_freeze_depth) ||
+				(atomic_read(&q->mq_freeze_depth) == 0 &&
+				 (preempt || !blk_queue_preempt_only(q))) ||
 				blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
@@ -1416,7 +1437,7 @@ static struct request *blk_old_get_request(struct request_queue *q,
 	create_io_context(gfp_mask, q->node);
 
 	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ||
-			      (op & REQ_NOWAIT));
+			      (op & REQ_NOWAIT), op & REQ_PREEMPT);
 	if (ret)
 		return ERR_PTR(ret);
 	spin_lock_irq(q->queue_lock);
@@ -2184,6 +2205,7 @@ blk_qc_t generic_make_request(struct bio *bio)
 	 */
 	struct bio_list bio_list_on_stack[2];
 	blk_qc_t ret = BLK_QC_T_NONE;
+	const bool nowait = bio->bi_opf & REQ_NOWAIT;
 
 	if (!generic_make_request_checks(bio))
 		goto out;
@@ -2223,7 +2245,7 @@ blk_qc_t generic_make_request(struct bio *bio)
 	do {
 		struct request_queue *q = bio->bi_disk->queue;
 
-		if (likely(blk_queue_enter(q, bio->bi_opf & REQ_NOWAIT) == 0)) {
+		if (likely(blk_queue_enter(q, nowait, false) == 0)) {
 			struct bio_list lower, same;
 
 			/* Create a fresh bio_list for all subordinate requests */
@@ -2248,8 +2270,7 @@ blk_qc_t generic_make_request(struct bio *bio)
 			bio_list_merge(&bio_list_on_stack[0], &same);
 			bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
 		} else {
-			if (unlikely(!blk_queue_dying(q) &&
-					(bio->bi_opf & REQ_NOWAIT)))
+			if (unlikely(!blk_queue_dying(q) && nowait))
 				bio_wouldblock_error(bio);
 			else
 				bio_io_error(bio);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 10c1f49f663d..cbf7bf7e3d13 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -384,7 +384,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 	struct request *rq;
 	int ret;
 
-	ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
+	ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT, op & REQ_PREEMPT);
 	if (ret)
 		return ERR_PTR(ret);
 
@@ -423,7 +423,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	if (hctx_idx >= q->nr_hw_queues)
 		return ERR_PTR(-EIO);
 
-	ret = blk_queue_enter(q, true);
+	ret = blk_queue_enter(q, true, op & REQ_PREEMPT);
 	if (ret)
 		return ERR_PTR(ret);
 
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 17ec83bb0900..0dfdc975473a 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -134,7 +134,7 @@ void blk_timeout_work(struct work_struct *work)
 	struct request *rq, *tmp;
 	int next_set = 0;
 
-	if (blk_queue_enter(q, true))
+	if (blk_queue_enter(q, true, true))
 		return;
 	spin_lock_irqsave(q->queue_lock, flags);
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 93d088ffc05c..e9ca45087a40 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -674,7 +674,7 @@ int bdev_read_page(struct block_device *bdev, sector_t sector,
 	if (!ops->rw_page || bdev_get_integrity(bdev))
 		return result;
 
-	result = blk_queue_enter(bdev->bd_queue, false);
+	result = blk_queue_enter(bdev->bd_queue, false, false);
 	if (result)
 		return result;
 	result = ops->rw_page(bdev, sector + get_start_sect(bdev), page, false);
@@ -710,7 +710,7 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
 
 	if (!ops->rw_page || bdev_get_integrity(bdev))
 		return -EOPNOTSUPP;
-	result = blk_queue_enter(bdev->bd_queue, false);
+	result = blk_queue_enter(bdev->bd_queue, false, false);
 	if (result)
 		return result;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5bd87599eed0..a025597c378a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -964,7 +964,7 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 			 struct scsi_ioctl_command __user *);
 
-extern int blk_queue_enter(struct request_queue *q, bool nowait);
+extern int blk_queue_enter(struct request_queue *q, bool nowait, bool preempt);
 extern void blk_queue_exit(struct request_queue *q);
 extern void blk_start_queue(struct request_queue *q);
 extern void blk_start_queue_async(struct request_queue *q);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 7/7] block: Make SCSI device suspend and resume work reliably
  2017-09-25 20:29 ` [PATCH v4 7/7] block: Make SCSI device suspend and resume work reliably Bart Van Assche
@ 2017-09-25 22:59   ` Ming Lei
  2017-09-25 23:13     ` Bart Van Assche
  0 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2017-09-25 22:59 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Hannes Reinecke, Johannes Thumshirn

On Mon, Sep 25, 2017 at 01:29:24PM -0700, Bart Van Assche wrote:
> It is essential during suspend and resume that neither the filesystem
> state nor the filesystem metadata in RAM changes. This is why while
> the hibernation image is being written or restored that SCSI devices
> are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
> and scsi_device_resume(). In the SDEV_QUIESCE state execution of
> non-preempt requests is deferred. This is realized by returning
> BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
> devices. Avoid that a full queue prevents power management requests
> to be submitted by deferring allocation of non-preempt requests for
> devices in the quiesced state. This patch has been tested by running
> the following commands and by verifying that after resume the fio job
> is still running:
> 
> for d in /sys/class/block/sd*[a-z]; do
>   hcil=$(readlink "$d/device")
>   hcil=${hcil#../../../}
>   echo 4 > "$d/queue/nr_requests"
>   echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
> done
> bdev=$(readlink /dev/disk/by-uuid/5217d83f-213e-4b42-b86e-20013325ba6c)
> bdev=${bdev#../../}
> hcil=$(readlink "/sys/block/$bdev/device")
> hcil=${hcil#../../../}
> fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 --rw=randread \
>   --ioengine=libaio --numjobs=4 --iodepth=16 --iodepth_batch=1 --thread \
>   --loops=$((2**31)) &
> pid=$!
> sleep 1
> systemctl hibernate
> sleep 10
> kill $pid
> 
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348).
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  block/blk-core.c       | 41 +++++++++++++++++++++++++++++++----------
>  block/blk-mq.c         |  4 ++--
>  block/blk-timeout.c    |  2 +-
>  fs/block_dev.c         |  4 ++--
>  include/linux/blkdev.h |  2 +-
>  5 files changed, 37 insertions(+), 16 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 9111a8f9c7a1..01b7afee58f0 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -351,10 +351,12 @@ void blk_set_preempt_only(struct request_queue *q, bool preempt_only)
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(q->queue_lock, flags);
> -	if (preempt_only)
> +	if (preempt_only) {
>  		queue_flag_set(QUEUE_FLAG_PREEMPT_ONLY, q);
> -	else
> +	} else {
>  		queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
> +		wake_up_all(&q->mq_freeze_wq);
> +	}
>  	spin_unlock_irqrestore(q->queue_lock, flags);
>  }
>  EXPORT_SYMBOL(blk_set_preempt_only);
> @@ -776,13 +778,31 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
>  }
>  EXPORT_SYMBOL(blk_alloc_queue);
>  
> -int blk_queue_enter(struct request_queue *q, bool nowait)
> +/**
> + * blk_queue_enter() - try to increase q->q_usage_counter
> + * @q: request queue pointer
> + * @nowait: if the queue is frozen, do not wait until it is unfrozen
> + * @preempt: if QUEUE_FLAG_PREEMPT_ONLY has been set, do not wait until that
> + *	flag has been cleared
> + */
> +int blk_queue_enter(struct request_queue *q, bool nowait, bool preempt)
>  {
>  	while (true) {
>  		int ret;
>  
> -		if (percpu_ref_tryget_live(&q->q_usage_counter))
> -			return 0;
> +		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
> +			/*
> +			 * Since setting the PREEMPT_ONLY flag is followed
> +			 * by a switch of q_usage_counter from per-cpu to
> +			 * atomic mode and back to per-cpu and since the
> +			 * switch to atomic mode uses call_rcu_sched(), it
> +			 * is not necessary to call smp_rmb() here.
> +			 */

rcu_read_lock is held only inside percpu_ref_tryget_live().

Without one explicit barrier(smp_mb) between getting the refcounter
and reading the preempt only flag, the two operations(writing to
refcounter and reading the flag) can be reordered, so
unfreeze/unfreeze may be completed before this IO is completed.

> +			if (preempt || !blk_queue_preempt_only(q))
> +				return 0;
> +			else
> +				percpu_ref_put(&q->q_usage_counter);
> +		}
>  
>  		if (nowait)
>  			return -EBUSY;
> @@ -797,7 +817,8 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
>  		smp_rmb();
>  
>  		ret = wait_event_interruptible(q->mq_freeze_wq,
> -				!atomic_read(&q->mq_freeze_depth) ||
> +				(atomic_read(&q->mq_freeze_depth) == 0 &&
> +				 (preempt || !blk_queue_preempt_only(q))) ||
>  				blk_queue_dying(q));
>  		if (blk_queue_dying(q))
>  			return -ENODEV;
> @@ -1416,7 +1437,7 @@ static struct request *blk_old_get_request(struct request_queue *q,
>  	create_io_context(gfp_mask, q->node);
>  
>  	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ||
> -			      (op & REQ_NOWAIT));
> +			      (op & REQ_NOWAIT), op & REQ_PREEMPT);
>  	if (ret)
>  		return ERR_PTR(ret);
>  	spin_lock_irq(q->queue_lock);
> @@ -2184,6 +2205,7 @@ blk_qc_t generic_make_request(struct bio *bio)
>  	 */
>  	struct bio_list bio_list_on_stack[2];
>  	blk_qc_t ret = BLK_QC_T_NONE;
> +	const bool nowait = bio->bi_opf & REQ_NOWAIT;
>  
>  	if (!generic_make_request_checks(bio))
>  		goto out;
> @@ -2223,7 +2245,7 @@ blk_qc_t generic_make_request(struct bio *bio)
>  	do {
>  		struct request_queue *q = bio->bi_disk->queue;
>  
> -		if (likely(blk_queue_enter(q, bio->bi_opf & REQ_NOWAIT) == 0)) {
> +		if (likely(blk_queue_enter(q, nowait, false) == 0)) {
>  			struct bio_list lower, same;
>  
>  			/* Create a fresh bio_list for all subordinate requests */
> @@ -2248,8 +2270,7 @@ blk_qc_t generic_make_request(struct bio *bio)
>  			bio_list_merge(&bio_list_on_stack[0], &same);
>  			bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
>  		} else {
> -			if (unlikely(!blk_queue_dying(q) &&
> -					(bio->bi_opf & REQ_NOWAIT)))
> +			if (unlikely(!blk_queue_dying(q) && nowait))
>  				bio_wouldblock_error(bio);
>  			else
>  				bio_io_error(bio);
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 10c1f49f663d..cbf7bf7e3d13 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -384,7 +384,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>  	struct request *rq;
>  	int ret;
>  
> -	ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
> +	ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT, op & REQ_PREEMPT);
>  	if (ret)
>  		return ERR_PTR(ret);
>  
> @@ -423,7 +423,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>  	if (hctx_idx >= q->nr_hw_queues)
>  		return ERR_PTR(-EIO);
>  
> -	ret = blk_queue_enter(q, true);
> +	ret = blk_queue_enter(q, true, op & REQ_PREEMPT);
>  	if (ret)
>  		return ERR_PTR(ret);
>  
> diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> index 17ec83bb0900..0dfdc975473a 100644
> --- a/block/blk-timeout.c
> +++ b/block/blk-timeout.c
> @@ -134,7 +134,7 @@ void blk_timeout_work(struct work_struct *work)
>  	struct request *rq, *tmp;
>  	int next_set = 0;
>  
> -	if (blk_queue_enter(q, true))
> +	if (blk_queue_enter(q, true, true))
>  		return;
>  	spin_lock_irqsave(q->queue_lock, flags);
>  
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 93d088ffc05c..e9ca45087a40 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -674,7 +674,7 @@ int bdev_read_page(struct block_device *bdev, sector_t sector,
>  	if (!ops->rw_page || bdev_get_integrity(bdev))
>  		return result;
>  
> -	result = blk_queue_enter(bdev->bd_queue, false);
> +	result = blk_queue_enter(bdev->bd_queue, false, false);
>  	if (result)
>  		return result;
>  	result = ops->rw_page(bdev, sector + get_start_sect(bdev), page, false);
> @@ -710,7 +710,7 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
>  
>  	if (!ops->rw_page || bdev_get_integrity(bdev))
>  		return -EOPNOTSUPP;
> -	result = blk_queue_enter(bdev->bd_queue, false);
> +	result = blk_queue_enter(bdev->bd_queue, false, false);
>  	if (result)
>  		return result;
>  
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 5bd87599eed0..a025597c378a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -964,7 +964,7 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
>  extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
>  			 struct scsi_ioctl_command __user *);
>  
> -extern int blk_queue_enter(struct request_queue *q, bool nowait);
> +extern int blk_queue_enter(struct request_queue *q, bool nowait, bool preempt);
>  extern void blk_queue_exit(struct request_queue *q);
>  extern void blk_start_queue(struct request_queue *q);
>  extern void blk_start_queue_async(struct request_queue *q);
> -- 
> 2.14.1
> 

-- 
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
  2017-09-25 20:29 ` [PATCH v4 1/7] md: Make md resync and reshape threads freezable Bart Van Assche
@ 2017-09-25 23:04   ` Ming Lei
  2017-09-25 23:09       ` Bart Van Assche
  2017-09-26  6:06   ` Hannes Reinecke
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2017-09-25 23:04 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Shaohua Li, linux-raid, Hannes Reinecke,
	Johannes Thumshirn

On Mon, Sep 25, 2017 at 01:29:18PM -0700, Bart Van Assche wrote:
> Some people use the md driver on laptops and use the suspend and
> resume functionality. Since it is essential that submitting of
> new I/O requests stops before device quiescing starts, make the
> md resync and reshape threads freezable.

As I explained, if SCSI quiesce is safe, this patch shouldn't
be needed.

The issue isn't MD specific, and in theory can be triggered
on all devices. And you can see the I/O hang report on BTRFS(RAID)
without MD involved:

	https://marc.info/?l=linux-block&m=150634883816965&w=2

So please remove this patch from the patchset.

--
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
  2017-09-25 23:04   ` Ming Lei
@ 2017-09-25 23:09       ` Bart Van Assche
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Van Assche @ 2017-09-25 23:09 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, jthumshirn, linux-raid, hch, martin.petersen, axboe,
	oleksandr, hare, shli

On Tue, 2017-09-26 at 07:04 +0800, Ming Lei wrote:
> On Mon, Sep 25, 2017 at 01:29:18PM -0700, Bart Van Assche wrote:
> > Some people use the md driver on laptops and use the suspend and
> > resume functionality. Since it is essential that submitting of
> > new I/O requests stops before device quiescing starts, make the
> > md resync and reshape threads freezable.
> 
> As I explained, if SCSI quiesce is safe, this patch shouldn't
> be needed.
> 
> The issue isn't MD specific, and in theory can be triggered
> on all devices. And you can see the I/O hang report on BTRFS(RAID)
> without MD involved:
> 
> 	https://marc.info/?l=linux-block&m=150634883816965&w=2

What makes you think that this patch is not necessary once SCSI quiesce
has been made safe? Does this mean that you have not tested suspend and
resume while md RAID 1 resync was in progress? This patch is necessary
to avoid that suspend locks up while md RAID 1 resync is in progress.

Bart.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
@ 2017-09-25 23:09       ` Bart Van Assche
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Van Assche @ 2017-09-25 23:09 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, jthumshirn, linux-raid, hch, martin.petersen, axboe,
	oleksandr, hare, shli

T24gVHVlLCAyMDE3LTA5LTI2IGF0IDA3OjA0ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
TW9uLCBTZXAgMjUsIDIwMTcgYXQgMDE6Mjk6MThQTSAtMDcwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+IFNvbWUgcGVvcGxlIHVzZSB0aGUgbWQgZHJpdmVyIG9uIGxhcHRvcHMgYW5kIHVz
ZSB0aGUgc3VzcGVuZCBhbmQNCj4gPiByZXN1bWUgZnVuY3Rpb25hbGl0eS4gU2luY2UgaXQgaXMg
ZXNzZW50aWFsIHRoYXQgc3VibWl0dGluZyBvZg0KPiA+IG5ldyBJL08gcmVxdWVzdHMgc3RvcHMg
YmVmb3JlIGRldmljZSBxdWllc2Npbmcgc3RhcnRzLCBtYWtlIHRoZQ0KPiA+IG1kIHJlc3luYyBh
bmQgcmVzaGFwZSB0aHJlYWRzIGZyZWV6YWJsZS4NCj4gDQo+IEFzIEkgZXhwbGFpbmVkLCBpZiBT
Q1NJIHF1aWVzY2UgaXMgc2FmZSwgdGhpcyBwYXRjaCBzaG91bGRuJ3QNCj4gYmUgbmVlZGVkLg0K
PiANCj4gVGhlIGlzc3VlIGlzbid0IE1EIHNwZWNpZmljLCBhbmQgaW4gdGhlb3J5IGNhbiBiZSB0
cmlnZ2VyZWQNCj4gb24gYWxsIGRldmljZXMuIEFuZCB5b3UgY2FuIHNlZSB0aGUgSS9PIGhhbmcg
cmVwb3J0IG9uIEJUUkZTKFJBSUQpDQo+IHdpdGhvdXQgTUQgaW52b2x2ZWQ6DQo+IA0KPiAJaHR0
cHM6Ly9tYXJjLmluZm8vP2w9bGludXgtYmxvY2smbT0xNTA2MzQ4ODM4MTY5NjUmdz0yDQoNCldo
YXQgbWFrZXMgeW91IHRoaW5rIHRoYXQgdGhpcyBwYXRjaCBpcyBub3QgbmVjZXNzYXJ5IG9uY2Ug
U0NTSSBxdWllc2NlDQpoYXMgYmVlbiBtYWRlIHNhZmU/IERvZXMgdGhpcyBtZWFuIHRoYXQgeW91
IGhhdmUgbm90IHRlc3RlZCBzdXNwZW5kIGFuZA0KcmVzdW1lIHdoaWxlIG1kIFJBSUQgMSByZXN5
bmMgd2FzIGluIHByb2dyZXNzPyBUaGlzIHBhdGNoIGlzIG5lY2Vzc2FyeQ0KdG8gYXZvaWQgdGhh
dCBzdXNwZW5kIGxvY2tzIHVwIHdoaWxlIG1kIFJBSUQgMSByZXN5bmMgaXMgaW4gcHJvZ3Jlc3Mu
DQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 7/7] block: Make SCSI device suspend and resume work reliably
  2017-09-25 22:59   ` Ming Lei
@ 2017-09-25 23:13     ` Bart Van Assche
  2017-09-26  8:32       ` Ming Lei
  0 siblings, 1 reply; 48+ messages in thread
From: Bart Van Assche @ 2017-09-25 23:13 UTC (permalink / raw)
  To: ming.lei
  Cc: hch, hare, jthumshirn, linux-block, oleksandr, martin.petersen, axboe

T24gVHVlLCAyMDE3LTA5LTI2IGF0IDA2OjU5ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
TW9uLCBTZXAgMjUsIDIwMTcgYXQgMDE6Mjk6MjRQTSAtMDcwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+ICtpbnQgYmxrX3F1ZXVlX2VudGVyKHN0cnVjdCByZXF1ZXN0X3F1ZXVlICpxLCBi
b29sIG5vd2FpdCwgYm9vbCBwcmVlbXB0KQ0KPiA+ICB7DQo+ID4gIAl3aGlsZSAodHJ1ZSkgew0K
PiA+ICAJCWludCByZXQ7DQo+ID4gIA0KPiA+IC0JCWlmIChwZXJjcHVfcmVmX3RyeWdldF9saXZl
KCZxLT5xX3VzYWdlX2NvdW50ZXIpKQ0KPiA+IC0JCQlyZXR1cm4gMDsNCj4gPiArCQlpZiAocGVy
Y3B1X3JlZl90cnlnZXRfbGl2ZSgmcS0+cV91c2FnZV9jb3VudGVyKSkgew0KPiA+ICsJCQkvKg0K
PiA+ICsJCQkgKiBTaW5jZSBzZXR0aW5nIHRoZSBQUkVFTVBUX09OTFkgZmxhZyBpcyBmb2xsb3dl
ZA0KPiA+ICsJCQkgKiBieSBhIHN3aXRjaCBvZiBxX3VzYWdlX2NvdW50ZXIgZnJvbSBwZXItY3B1
IHRvDQo+ID4gKwkJCSAqIGF0b21pYyBtb2RlIGFuZCBiYWNrIHRvIHBlci1jcHUgYW5kIHNpbmNl
IHRoZQ0KPiA+ICsJCQkgKiBzd2l0Y2ggdG8gYXRvbWljIG1vZGUgdXNlcyBjYWxsX3JjdV9zY2hl
ZCgpLCBpdA0KPiA+ICsJCQkgKiBpcyBub3QgbmVjZXNzYXJ5IHRvIGNhbGwgc21wX3JtYigpIGhl
cmUuDQo+ID4gKwkJCSAqLw0KPiANCj4gcmN1X3JlYWRfbG9jayBpcyBoZWxkIG9ubHkgaW5zaWRl
IHBlcmNwdV9yZWZfdHJ5Z2V0X2xpdmUoKS4NCj4gDQo+IFdpdGhvdXQgb25lIGV4cGxpY2l0IGJh
cnJpZXIoc21wX21iKSBiZXR3ZWVuIGdldHRpbmcgdGhlIHJlZmNvdW50ZXINCj4gYW5kIHJlYWRp
bmcgdGhlIHByZWVtcHQgb25seSBmbGFnLCB0aGUgdHdvIG9wZXJhdGlvbnMod3JpdGluZyB0bw0K
PiByZWZjb3VudGVyIGFuZCByZWFkaW5nIHRoZSBmbGFnKSBjYW4gYmUgcmVvcmRlcmVkLCBzbw0K
PiB1bmZyZWV6ZS91bmZyZWV6ZSBtYXkgYmUgY29tcGxldGVkIGJlZm9yZSB0aGlzIElPIGlzIGNv
bXBsZXRlZC4NCg0KU29ycnkgYnV0IEkgZGlzYWdyZWUuIEknbSB1c2luZyBSQ1UgdG8gYWNoaWV2
ZSB0aGUgc2FtZSBlZmZlY3QgYXMgYSBiYXJyaWVyDQphbmQgdG8gbW92ZSB0aGUgY29zdCBvZiB0
aGUgYmFycmllciBmcm9tIHRoZSByZWFkZXIgdG8gdGhlIHVwZGF0ZXIuIFNlZSBhbHNvDQpQYXVs
IEUuIE1jS2VubmV5LCBNYXRoaWV1IERlc25veWVycywgTGFpIEppYW5nc2hhbiwgYW5kIEpvc2gg
VHJpcGxldHQsDQpUaGUgUkNVLWJhcnJpZXIgbWVuYWdlcmllLCBMV04ubmV0LCBOb3ZlbWJlciAx
MiwgMjAxMw0KKGh0dHBzOi8vbHduLm5ldC9BcnRpY2xlcy81NzM0OTcvKS4NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
  2017-09-25 23:09       ` Bart Van Assche
  (?)
@ 2017-09-26  4:01       ` Ming Lei
  2017-09-26  8:13         ` Ming Lei
  -1 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2017-09-26  4:01 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, jthumshirn, linux-raid, hch, martin.petersen, axboe,
	oleksandr, hare, shli

On Mon, Sep 25, 2017 at 11:09:15PM +0000, Bart Van Assche wrote:
> On Tue, 2017-09-26 at 07:04 +0800, Ming Lei wrote:
> > On Mon, Sep 25, 2017 at 01:29:18PM -0700, Bart Van Assche wrote:
> > > Some people use the md driver on laptops and use the suspend and
> > > resume functionality. Since it is essential that submitting of
> > > new I/O requests stops before device quiescing starts, make the
> > > md resync and reshape threads freezable.
> > 
> > As I explained, if SCSI quiesce is safe, this patch shouldn't
> > be needed.
> > 
> > The issue isn't MD specific, and in theory can be triggered
> > on all devices. And you can see the I/O hang report on BTRFS(RAID)
> > without MD involved:
> > 
> > 	https://marc.info/?l=linux-block&m=150634883816965&w=2
> 
> What makes you think that this patch is not necessary once SCSI quiesce
> has been made safe? Does this mean that you have not tested suspend and

If we want to make SCSI quiesce safe, we have to drain up all submitted
I/O and prevent new I/O from being submitted, that is enough to deal
with MD's resync too.

> resume while md RAID 1 resync was in progress? This patch is necessary
> to avoid that suspend locks up while md RAID 1 resync is in progress.

I tested my patchset on RAID10 when resync in progress, not see any
issue during suspend/resume, without any MD's change. I will test
RAID1's later, but I don't think there is difference compared with
RAID10 because my patchset can make the queue being quiesced totally.

The upstream report did not mention that the resync is in progress:

	https://marc.info/?l=linux-block&m=150402198315951&w=2

I want to make the patchset(make SCSI quiesce safe) focusing on this
SCSI quiesce issue.

Maybe with this MD patch, the original report becomes quite difficult
to reproduce, that doesn't mean this MD patch fixes this kind of
issue. What I really cares is that your changes on BLOCK/SCSI can
fix this issue.


-- 
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 2/7] block: Make q_usage_counter also track legacy requests
  2017-09-25 20:29 ` [PATCH v4 2/7] block: Make q_usage_counter also track legacy requests Bart Van Assche
@ 2017-09-26  6:05   ` Hannes Reinecke
  2017-09-26  8:34   ` Ming Lei
  2017-10-02 13:29   ` Christoph Hellwig
  2 siblings, 0 replies; 48+ messages in thread
From: Hannes Reinecke @ 2017-09-26  6:05 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn

On 09/25/2017 10:29 PM, Bart Van Assche wrote:
> From: Ming Lei <ming.lei@redhat.com>
> 
> This patch makes it possible to pause request allocation for
> the legacy block layer by calling blk_mq_freeze_queue() and
> blk_mq_unfreeze_queue().
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> [ bvanassche: Combined two patches into one, edited a comment and made sure
>   REQ_NOWAIT is handled properly in blk_old_get_request() ]
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  block/blk-core.c | 12 ++++++++++++
>  block/blk-mq.c   | 10 ++--------
>  2 files changed, 14 insertions(+), 8 deletions(-)
> 
I actually had a customer reoport hitting a q_usage_counter underflow,
so this is a real bugfix. And I'd advocate to have it pushed
independently on the overall patchset, if it should be delayed even further.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
  2017-09-25 20:29 ` [PATCH v4 1/7] md: Make md resync and reshape threads freezable Bart Van Assche
  2017-09-25 23:04   ` Ming Lei
@ 2017-09-26  6:06   ` Hannes Reinecke
  2017-09-26 11:17   ` Ming Lei
  2017-10-02 13:26   ` Christoph Hellwig
  3 siblings, 0 replies; 48+ messages in thread
From: Hannes Reinecke @ 2017-09-26  6:06 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Shaohua Li, linux-raid, Ming Lei,
	Hannes Reinecke, Johannes Thumshirn

On 09/25/2017 10:29 PM, Bart Van Assche wrote:
> Some people use the md driver on laptops and use the suspend and
> resume functionality. Since it is essential that submitting of
> new I/O requests stops before device quiescing starts, make the
> md resync and reshape threads freezable.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: linux-raid@vger.kernel.org
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  drivers/md/md.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
  2017-09-26  4:01       ` Ming Lei
@ 2017-09-26  8:13         ` Ming Lei
  2017-09-26 14:40             ` Bart Van Assche
  0 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2017-09-26  8:13 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, jthumshirn, linux-raid, hch, martin.petersen, axboe,
	oleksandr, hare, shli

On Tue, Sep 26, 2017 at 12:01:03PM +0800, Ming Lei wrote:
> On Mon, Sep 25, 2017 at 11:09:15PM +0000, Bart Van Assche wrote:
> > On Tue, 2017-09-26 at 07:04 +0800, Ming Lei wrote:
> > > On Mon, Sep 25, 2017 at 01:29:18PM -0700, Bart Van Assche wrote:
> > > > Some people use the md driver on laptops and use the suspend and
> > > > resume functionality. Since it is essential that submitting of
> > > > new I/O requests stops before device quiescing starts, make the
> > > > md resync and reshape threads freezable.
> > > 
> > > As I explained, if SCSI quiesce is safe, this patch shouldn't
> > > be needed.
> > > 
> > > The issue isn't MD specific, and in theory can be triggered
> > > on all devices. And you can see the I/O hang report on BTRFS(RAID)
> > > without MD involved:
> > > 
> > > 	https://marc.info/?l=linux-block&m=150634883816965&w=2
> > 
> > What makes you think that this patch is not necessary once SCSI quiesce
> > has been made safe? Does this mean that you have not tested suspend and
> 
> If we want to make SCSI quiesce safe, we have to drain up all submitted
> I/O and prevent new I/O from being submitted, that is enough to deal
> with MD's resync too.
> 
> > resume while md RAID 1 resync was in progress? This patch is necessary
> > to avoid that suspend locks up while md RAID 1 resync is in progress.
> 
> I tested my patchset on RAID10 when resync in progress, not see any
> issue during suspend/resume, without any MD's change. I will test
> RAID1's later, but I don't think there is difference compared with
> RAID10 because my patchset can make the queue being quiesced totally.

I am pretty sure that suspend/resume can survive when resync in progress
with my patchset applied on RAID1, without any MD change.

There are reports on suspend/resume on btrfs(RAID) and revalidate path
in scsi_transport_spi device, so the issue isn't MD specific again.

If your patchset depends on this MD change, something should be wrong
in the following patches. Now I need to take a close look.


-- 
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 7/7] block: Make SCSI device suspend and resume work reliably
  2017-09-25 23:13     ` Bart Van Assche
@ 2017-09-26  8:32       ` Ming Lei
  2017-09-26 14:44         ` Bart Van Assche
  0 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2017-09-26  8:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hch, hare, jthumshirn, linux-block, oleksandr, martin.petersen, axboe

On Mon, Sep 25, 2017 at 11:13:47PM +0000, Bart Van Assche wrote:
> On Tue, 2017-09-26 at 06:59 +0800, Ming Lei wrote:
> > On Mon, Sep 25, 2017 at 01:29:24PM -0700, Bart Van Assche wrote:
> > > +int blk_queue_enter(struct request_queue *q, bool nowait, bool preempt)
> > >  {
> > >  	while (true) {
> > >  		int ret;
> > >  
> > > -		if (percpu_ref_tryget_live(&q->q_usage_counter))
> > > -			return 0;
> > > +		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
> > > +			/*
> > > +			 * Since setting the PREEMPT_ONLY flag is followed
> > > +			 * by a switch of q_usage_counter from per-cpu to
> > > +			 * atomic mode and back to per-cpu and since the
> > > +			 * switch to atomic mode uses call_rcu_sched(), it
> > > +			 * is not necessary to call smp_rmb() here.
> > > +			 */
> > 
> > rcu_read_lock is held only inside percpu_ref_tryget_live().
> > 
> > Without one explicit barrier(smp_mb) between getting the refcounter
> > and reading the preempt only flag, the two operations(writing to
> > refcounter and reading the flag) can be reordered, so
> > unfreeze/unfreeze may be completed before this IO is completed.
> 
> Sorry but I disagree. I'm using RCU to achieve the same effect as a barrier
> and to move the cost of the barrier from the reader to the updater. See also
> Paul E. McKenney, Mathieu Desnoyers, Lai Jiangshan, and Josh Triplett,
> The RCU-barrier menagerie, LWN.net, November 12, 2013
> (https://lwn.net/Articles/573497/).

Let me explain it in a bit details:

1) in SCSI quiesce path:

We can think there is one synchronize_rcu() in freeze/unfreeze.

Let's see the RCU document:

Documentation/RCU/whatisRCU.txt:

	void synchronize_rcu(void);

    Marks the end of updater code and the beginning of reclaimer
    code.  It does this by blocking until all pre-existing RCU
    read-side critical sections on all CPUs have completed.
    Note that synchronize_rcu() will -not- necessarily wait for
    any subsequent RCU read-side critical sections to complete.
    For example, consider the following sequence of events:

So synchronize_rcu() in SCSI quiesce path just waits for completion
of pre-existing read-side critical section, and subsequent RCU
read-side critical sections won't be waited.

2) in normal I/O path of blk_enter_queue()

- only rcu read lock is held in percpu_ref_tryget_live(), and the lock
is released when this helper returns.

- there isn't explicit barrier(smp_mb()) between percpu_ref_tryget_live()
and checking flag of preempt only, so writing to percpu_ref counter
and reading the preempt flag can be reordered as the following:

	-- check flag of preempt only 

		current process preempt out now, and just at the exact
		time, SCSI quiesce is run from another process, then
		freeze/unfreeze is completed because no pre-exit read-side
		critical sections, and the percpu_ref isn't held too.

		finally this process is preeempt in, and try to grab one ref
		and submit I/O after SCSI is quiesced(which shouldn't be
		allowed)

	-- percpu_ref_tryget_live()


-- 
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 2/7] block: Make q_usage_counter also track legacy requests
  2017-09-25 20:29 ` [PATCH v4 2/7] block: Make q_usage_counter also track legacy requests Bart Van Assche
  2017-09-26  6:05   ` Hannes Reinecke
@ 2017-09-26  8:34   ` Ming Lei
  2017-10-02 13:29   ` Christoph Hellwig
  2 siblings, 0 replies; 48+ messages in thread
From: Ming Lei @ 2017-09-26  8:34 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Hannes Reinecke, Johannes Thumshirn

On Mon, Sep 25, 2017 at 01:29:19PM -0700, Bart Van Assche wrote:
> From: Ming Lei <ming.lei@redhat.com>
> 
> This patch makes it possible to pause request allocation for
> the legacy block layer by calling blk_mq_freeze_queue() and
> blk_mq_unfreeze_queue().
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> [ bvanassche: Combined two patches into one, edited a comment and made sure
>   REQ_NOWAIT is handled properly in blk_old_get_request() ]
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---

Thanks for considering block legacy path in this version!

-- 
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI
  2017-09-25 20:29 [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI Bart Van Assche
                   ` (6 preceding siblings ...)
  2017-09-25 20:29 ` [PATCH v4 7/7] block: Make SCSI device suspend and resume work reliably Bart Van Assche
@ 2017-09-26  9:15 ` Ming Lei
  2017-09-26 14:29   ` Bart Van Assche
  7 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2017-09-26  9:15 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko

On Mon, Sep 25, 2017 at 01:29:17PM -0700, Bart Van Assche wrote:
> Hello Jens,
> 
> It is known that during the resume following a hibernate, especially when
> using an md RAID1 array created on top of SCSI devices, sometimes the
> system hangs instead of coming up properly. This patch series fixes this
> problem. This patch series is an alternative for Ming Lei's "[PATCH V5
> 0/10] block/scsi: safe SCSI quiescing" patch series. The advantages of
> this patch series are:

No, if you only address issue on MD device, that is definitely not
alternative of my patchset.

Guys have verified that my patches can fix either MD and
non-MD(btrfs/raid, transport_spi) case. And my patchset can fix this
kind of issue on any SCSI device, w/w.o MD.

-- 
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
  2017-09-25 20:29 ` [PATCH v4 1/7] md: Make md resync and reshape threads freezable Bart Van Assche
  2017-09-25 23:04   ` Ming Lei
  2017-09-26  6:06   ` Hannes Reinecke
@ 2017-09-26 11:17   ` Ming Lei
  2017-09-26 14:42       ` Bart Van Assche
  2017-10-02 13:26   ` Christoph Hellwig
  3 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2017-09-26 11:17 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Shaohua Li, linux-raid, Hannes Reinecke,
	Johannes Thumshirn

On Mon, Sep 25, 2017 at 01:29:18PM -0700, Bart Van Assche wrote:
> Some people use the md driver on laptops and use the suspend and
> resume functionality. Since it is essential that submitting of
> new I/O requests stops before device quiescing starts, make the
> md resync and reshape threads freezable.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: linux-raid@vger.kernel.org
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  drivers/md/md.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 08fcaebc61bd..26a12bd0db65 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -66,6 +66,7 @@
>  #include <linux/raid/md_u.h>
>  #include <linux/slab.h>
>  #include <linux/percpu-refcount.h>
> +#include <linux/freezer.h>
>  
>  #include <trace/events/block.h>
>  #include "md.h"
> @@ -7424,6 +7425,7 @@ static int md_thread(void *arg)
>  	 */
>  
>  	allow_signal(SIGKILL);
> +	set_freezable();
>  	while (!kthread_should_stop()) {
>  
>  		/* We need to wait INTERRUPTIBLE so that
> @@ -7434,7 +7436,7 @@ static int md_thread(void *arg)
>  		if (signal_pending(current))
>  			flush_signals(current);
>  
> -		wait_event_interruptible_timeout
> +		wait_event_freezable_timeout
>  			(thread->wqueue,
>  			 test_bit(THREAD_WAKEUP, &thread->flags)
>  			 || kthread_should_stop() || kthread_should_park(),
> @@ -8133,6 +8135,8 @@ void md_do_sync(struct md_thread *thread)
>  		return;
>  	}
>  
> +	set_freezable();
> +
>  	if (mddev_is_clustered(mddev)) {
>  		ret = md_cluster_ops->resync_start(mddev);
>  		if (ret)
> @@ -8324,7 +8328,7 @@ void md_do_sync(struct md_thread *thread)
>  		     mddev->curr_resync_completed > mddev->resync_max
>  			    )) {
>  			/* time to update curr_resync_completed */
> -			wait_event(mddev->recovery_wait,
> +			wait_event_freezable(mddev->recovery_wait,
>  				   atomic_read(&mddev->recovery_active) == 0);
>  			mddev->curr_resync_completed = j;
>  			if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) &&
> @@ -8342,10 +8346,10 @@ void md_do_sync(struct md_thread *thread)
>  			 * to avoid triggering warnings.
>  			 */
>  			flush_signals(current); /* just in case */
> -			wait_event_interruptible(mddev->recovery_wait,
> -						 mddev->resync_max > j
> -						 || test_bit(MD_RECOVERY_INTR,
> -							     &mddev->recovery));
> +			wait_event_freezable(mddev->recovery_wait,
> +					     mddev->resync_max > j ||
> +					     test_bit(MD_RECOVERY_INTR,
> +						      &mddev->recovery));
>  		}
>  
>  		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
> @@ -8421,7 +8425,7 @@ void md_do_sync(struct md_thread *thread)
>  				 * Give other IO more of a chance.
>  				 * The faster the devices, the less we wait.
>  				 */
> -				wait_event(mddev->recovery_wait,
> +				wait_event_freezable(mddev->recovery_wait,
>  					   !atomic_read(&mddev->recovery_active));
>  			}
>  		}
> @@ -8433,7 +8437,8 @@ void md_do_sync(struct md_thread *thread)
>  	 * this also signals 'finished resyncing' to md_stop
>  	 */
>  	blk_finish_plug(&plug);
> -	wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));
> +	wait_event_freezable(mddev->recovery_wait,
> +			     !atomic_read(&mddev->recovery_active));
>  
>  	if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
>  	    !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
> -- 
> 2.14.1
> 

Just test this patch a bit and the following failure of freezing task
is triggered during suspend:

[   38.903513] PM: suspend entry (deep)
[   38.904443] PM: Syncing filesystems ... done.
[   38.983591] Freezing user space processes ... (elapsed 0.002 seconds) done.
[   38.987522] OOM killer disabled.
[   38.987962] Freezing remaining freezable tasks ...
[   58.998872] Freezing of tasks failed after 20.008 seconds (1 tasks refusing to freeze, wq_busy=0):
[   59.002539] md127_resync    D    0  1618      2 0x80000000
[   59.004954] Call Trace:
[   59.006162]  __schedule+0x41f/0xa50
[   59.007704]  schedule+0x3d/0x90
[   59.009305]  raid1_sync_request+0x2da/0xd10 [raid1]
[   59.011505]  ? remove_wait_queue+0x70/0x70
[   59.013352]  md_do_sync+0xdfa/0x12c0
[   59.014955]  ? remove_wait_queue+0x70/0x70
[   59.016336]  md_thread+0x1a8/0x1e0
[   59.016770]  ? md_thread+0x1a8/0x1e0
[   59.017250]  kthread+0x155/0x190
[   59.017662]  ? sync_speed_show+0xa0/0xa0
[   59.018217]  ? kthread_create_on_node+0x70/0x70
[   59.018858]  ret_from_fork+0x2a/0x40
[   59.019403] Restarting kernel threads ... done.
[   59.024586] OOM killer enabled.
[   59.025124] Restarting tasks ... done.
[   59.045906] PM: suspend exit
[   97.919428] systemd-journald[227]: Sent WATCHDOG=1 notification.
[  101.002695] md: md127: data-check done.



-- 
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI
  2017-09-26  9:15 ` [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI Ming Lei
@ 2017-09-26 14:29   ` Bart Van Assche
  2017-09-26 14:54     ` Ming Lei
  0 siblings, 1 reply; 48+ messages in thread
From: Bart Van Assche @ 2017-09-26 14:29 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, oleksandr, martin.petersen, axboe

T24gVHVlLCAyMDE3LTA5LTI2IGF0IDE3OjE1ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gTm8s
IGlmIHlvdSBvbmx5IGFkZHJlc3MgaXNzdWUgb24gTUQgZGV2aWNlLCB0aGF0IGlzIGRlZmluaXRl
bHkgbm90DQo+IGFsdGVybmF0aXZlIG9mIG15IHBhdGNoc2V0Lg0KDQpBIGNsYXJpZmljYXRpb246
IG15IHBhdGNoIHNlcmllcyBub3Qgb25seSBmaXhlcyBzdXNwZW5kIGFuZCByZXN1bWUgZm9yIG1k
LW9uLVNDU0kNCmJ1dCBhbHNvIGZvciBhbGwgb3RoZXIgc2NlbmFyaW8ncyBpbiB3aGljaCByZXN1
bWUgbG9ja3MgdXAgZHVlIHRvIG5vIHRhZyBiZWluZw0KYXZhaWxhYmxlIHdoZW4gdGhlIFNDU0kg
Y29yZSB0cmllcyB0byBleGVjdXRlIGEgcG93ZXIgbWFuYWdlbWVudCBjb21tYW5kLg0KDQpCYXJ0
Lg==

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
  2017-09-26  8:13         ` Ming Lei
@ 2017-09-26 14:40             ` Bart Van Assche
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Van Assche @ 2017-09-26 14:40 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, jthumshirn, linux-raid, hch, martin.petersen, axboe,
	oleksandr, hare, shli

On Tue, 2017-09-26 at 16:13 +0800, Ming Lei wrote:
> I am pretty sure that suspend/resume can survive when resync in progress
> with my patchset applied on RAID1, without any MD change.

The above shows that you do not understand how suspend and resume works.
In the documents in the directory Documentation/power it is explained clearly
that I/O must be stopped before the hibernation image is generation to avoid
hard to repair filesystem corruption. Although md is not a filesystem I think
this also applies to md since md keeps some state information in RAM and some
state information on disk. It is essential that all that state information is
in consistent.

> If your patchset depends on this MD change, something should be wrong
> in the following patches. Now I need to take a close look.

The later patches that make SCSI quiesce and resume safe do not depend on
this change.

Bart.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
@ 2017-09-26 14:40             ` Bart Van Assche
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Van Assche @ 2017-09-26 14:40 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, jthumshirn, linux-raid, hch, martin.petersen, axboe,
	oleksandr, hare, shli

T24gVHVlLCAyMDE3LTA5LTI2IGF0IDE2OjEzICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gSSBh
bSBwcmV0dHkgc3VyZSB0aGF0IHN1c3BlbmQvcmVzdW1lIGNhbiBzdXJ2aXZlIHdoZW4gcmVzeW5j
IGluIHByb2dyZXNzDQo+IHdpdGggbXkgcGF0Y2hzZXQgYXBwbGllZCBvbiBSQUlEMSwgd2l0aG91
dCBhbnkgTUQgY2hhbmdlLg0KDQpUaGUgYWJvdmUgc2hvd3MgdGhhdCB5b3UgZG8gbm90IHVuZGVy
c3RhbmQgaG93IHN1c3BlbmQgYW5kIHJlc3VtZSB3b3Jrcy4NCkluIHRoZSBkb2N1bWVudHMgaW4g
dGhlIGRpcmVjdG9yeSBEb2N1bWVudGF0aW9uL3Bvd2VyIGl0IGlzIGV4cGxhaW5lZCBjbGVhcmx5
DQp0aGF0IEkvTyBtdXN0IGJlIHN0b3BwZWQgYmVmb3JlIHRoZSBoaWJlcm5hdGlvbiBpbWFnZSBp
cyBnZW5lcmF0aW9uIHRvIGF2b2lkDQpoYXJkIHRvIHJlcGFpciBmaWxlc3lzdGVtIGNvcnJ1cHRp
b24uIEFsdGhvdWdoIG1kIGlzIG5vdCBhIGZpbGVzeXN0ZW0gSSB0aGluaw0KdGhpcyBhbHNvIGFw
cGxpZXMgdG8gbWQgc2luY2UgbWQga2VlcHMgc29tZSBzdGF0ZSBpbmZvcm1hdGlvbiBpbiBSQU0g
YW5kIHNvbWUNCnN0YXRlIGluZm9ybWF0aW9uIG9uIGRpc2suIEl0IGlzIGVzc2VudGlhbCB0aGF0
IGFsbCB0aGF0IHN0YXRlIGluZm9ybWF0aW9uIGlzDQppbiBjb25zaXN0ZW50Lg0KDQo+IElmIHlv
dXIgcGF0Y2hzZXQgZGVwZW5kcyBvbiB0aGlzIE1EIGNoYW5nZSwgc29tZXRoaW5nIHNob3VsZCBi
ZSB3cm9uZw0KPiBpbiB0aGUgZm9sbG93aW5nIHBhdGNoZXMuIE5vdyBJIG5lZWQgdG8gdGFrZSBh
IGNsb3NlIGxvb2suDQoNClRoZSBsYXRlciBwYXRjaGVzIHRoYXQgbWFrZSBTQ1NJIHF1aWVzY2Ug
YW5kIHJlc3VtZSBzYWZlIGRvIG5vdCBkZXBlbmQgb24NCnRoaXMgY2hhbmdlLg0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
  2017-09-26 11:17   ` Ming Lei
@ 2017-09-26 14:42       ` Bart Van Assche
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Van Assche @ 2017-09-26 14:42 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, jthumshirn, linux-raid, hch, martin.petersen, axboe,
	oleksandr, hare, shli

On Tue, 2017-09-26 at 19:17 +0800, Ming Lei wrote:
> Just test this patch a bit and the following failure of freezing task
> is triggered during suspend: [ ... ]

What kernel version did you start from and which patches were applied on top of
that kernel? Only patch 1/7 or all seven patches? What storage configuration did
you use in your test and what command(s) did you use to trigger suspend?

Bart.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
@ 2017-09-26 14:42       ` Bart Van Assche
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Van Assche @ 2017-09-26 14:42 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, jthumshirn, linux-raid, hch, martin.petersen, axboe,
	oleksandr, hare, shli

T24gVHVlLCAyMDE3LTA5LTI2IGF0IDE5OjE3ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gSnVz
dCB0ZXN0IHRoaXMgcGF0Y2ggYSBiaXQgYW5kIHRoZSBmb2xsb3dpbmcgZmFpbHVyZSBvZiBmcmVl
emluZyB0YXNrDQo+IGlzIHRyaWdnZXJlZCBkdXJpbmcgc3VzcGVuZDogWyAuLi4gXQ0KDQpXaGF0
IGtlcm5lbCB2ZXJzaW9uIGRpZCB5b3Ugc3RhcnQgZnJvbSBhbmQgd2hpY2ggcGF0Y2hlcyB3ZXJl
IGFwcGxpZWQgb24gdG9wIG9mDQp0aGF0IGtlcm5lbD8gT25seSBwYXRjaCAxLzcgb3IgYWxsIHNl
dmVuIHBhdGNoZXM/IFdoYXQgc3RvcmFnZSBjb25maWd1cmF0aW9uIGRpZA0KeW91IHVzZSBpbiB5
b3VyIHRlc3QgYW5kIHdoYXQgY29tbWFuZChzKSBkaWQgeW91IHVzZSB0byB0cmlnZ2VyIHN1c3Bl
bmQ/DQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 7/7] block: Make SCSI device suspend and resume work reliably
  2017-09-26  8:32       ` Ming Lei
@ 2017-09-26 14:44         ` Bart Van Assche
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Van Assche @ 2017-09-26 14:44 UTC (permalink / raw)
  To: ming.lei
  Cc: hch, axboe, hare, jthumshirn, oleksandr, linux-block, martin.petersen

T24gVHVlLCAyMDE3LTA5LTI2IGF0IDE2OjMyICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
TW9uLCBTZXAgMjUsIDIwMTcgYXQgMTE6MTM6NDdQTSArMDAwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+IFNvcnJ5IGJ1dCBJIGRpc2FncmVlLiBJJ20gdXNpbmcgUkNVIHRvIGFjaGlldmUg
dGhlIHNhbWUgZWZmZWN0IGFzIGEgYmFycmllcg0KPiA+IGFuZCB0byBtb3ZlIHRoZSBjb3N0IG9m
IHRoZSBiYXJyaWVyIGZyb20gdGhlIHJlYWRlciB0byB0aGUgdXBkYXRlci4gU2VlIGFsc28NCj4g
PiBQYXVsIEUuIE1jS2VubmV5LCBNYXRoaWV1IERlc25veWVycywgTGFpIEppYW5nc2hhbiwgYW5k
IEpvc2ggVHJpcGxldHQsDQo+ID4gVGhlIFJDVS1iYXJyaWVyIG1lbmFnZXJpZSwgTFdOLm5ldCwg
Tm92ZW1iZXIgMTIsIDIwMTMNCj4gPiAoaHR0cHM6Ly9sd24ubmV0L0FydGljbGVzLzU3MzQ5Ny8p
Lg0KPiANCj4gTGV0IG1lIGV4cGxhaW4gaXQgaW4gYSBiaXQgZGV0YWlsczogWyAuLi4gXQ0KDQpU
aGUgYXBwcm9hY2ggSSB1c2VkIGluIHBhdGNoIDcvNyBpcyBpZGVudGljYWwgdG8gb25lIG9mIHRo
ZSBhcHByb2FjaGVzIGV4cGxhaW5lZA0KaW4gdGhlIGFydGljbGUgSSByZWZlcnJlZCB0by4gSWYg
eW91IGRvIG5vdCBhZ3JlZSB0aGF0IHRoYXQgYXBwcm9hY2ggaXMgc2FmZSB0aGF0DQptZWFucyB0
aGF0IHlvdSBkbyBub3QgbmVpdGhlciBhZ3JlZSB3aXRoIFBhdWwgTWNLZW5uZXkncyB2aWV3IG9u
IGhvdyBSQ1Ugd29ya3MuDQpEbyB5b3UgcmVhbGx5IHRoaW5rIHRoYXQgeW91IGtub3cgbW9yZSBh
Ym91dCBSQ1UgdGhhbiBQYXVsIE1jS2VubmV5Pw0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI
  2017-09-26 14:29   ` Bart Van Assche
@ 2017-09-26 14:54     ` Ming Lei
  2017-09-26 20:17       ` Bart Van Assche
  0 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2017-09-26 14:54 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, oleksandr, martin.petersen, axboe

On Tue, Sep 26, 2017 at 02:29:06PM +0000, Bart Van Assche wrote:
> On Tue, 2017-09-26 at 17:15 +0800, Ming Lei wrote:
> > No, if you only address issue on MD device, that is definitely not
> > alternative of my patchset.
> 
> A clarification: my patch series not only fixes suspend and resume for md-on-SCSI
> but also for all other scenario's in which resume locks up due to no tag being

I do care about if this patchset can fix non-MD cases, like
btrfs/raid, or transport_spi, both are real reports. Could you
make sure if your patchset can cover this non-MD devices?

> available when the SCSI core tries to execute a power management command.

Firstly your cover letter title is 'Make suspend and resume safe for md-on-SCSI',
that makes me a bit nervous since it is for md-on-SCSI.

Secondly if your changes on block/SCSI can work on both MD and non-MD, why is
MD so special to be dealt with by the 1st MD patch? Can we remove it from this
patchset? I am pretty sure that my patchset needn't any MD specific change.

-- 
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
  2017-09-26 14:42       ` Bart Van Assche
  (?)
@ 2017-09-26 14:59       ` Ming Lei
  2017-09-27  3:12           ` Bart Van Assche
  -1 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2017-09-26 14:59 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, jthumshirn, linux-raid, hch, martin.petersen, axboe,
	oleksandr, hare, shli

On Tue, Sep 26, 2017 at 02:42:07PM +0000, Bart Van Assche wrote:
> On Tue, 2017-09-26 at 19:17 +0800, Ming Lei wrote:
> > Just test this patch a bit and the following failure of freezing task
> > is triggered during suspend: [ ... ]
> 
> What kernel version did you start from and which patches were applied on top of
> that kernel? Only patch 1/7 or all seven patches? What storage configuration did

It is v4.14-rc1+, and top commit is 8d93c7a43157, with all your 7 patches
applied.

> you use in your test and what command(s) did you use to trigger suspend?

Follows my pm test script:

	#!/bin/sh
	
	echo check > /sys/block/md127/md/sync_action
	
	mkfs.ext4 -F /dev/md127
	
	mount /dev/md0 /mnt/data
	
	dd if=/dev/zero of=/mnt/data/d1.img bs=4k count=128k&
	
	echo 9 > /proc/sys/kernel/printk
	echo devices > /sys/power/pm_test
	echo mem > /sys/power/state
	
	wait
	umount /mnt/data

Storage setting:

	sudo mdadm --create /dev/md/test /dev/sda /dev/sdb --level=1 --raid-devices=2
	both /dev/sda and /dev/sdb are virtio-scsi.


-- 
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
  2017-09-26 14:40             ` Bart Van Assche
  (?)
@ 2017-09-26 15:02             ` Ming Lei
  -1 siblings, 0 replies; 48+ messages in thread
From: Ming Lei @ 2017-09-26 15:02 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, jthumshirn, linux-raid, hch, martin.petersen, axboe,
	oleksandr, hare, shli

On Tue, Sep 26, 2017 at 02:40:36PM +0000, Bart Van Assche wrote:
> On Tue, 2017-09-26 at 16:13 +0800, Ming Lei wrote:
> > I am pretty sure that suspend/resume can survive when resync in progress
> > with my patchset applied on RAID1, without any MD change.
> 
> The above shows that you do not understand how suspend and resume works.
> In the documents in the directory Documentation/power it is explained clearly
> that I/O must be stopped before the hibernation image is generation to avoid

No, I don't use hibernation, and just use suspend/resume(s2r).

> hard to repair filesystem corruption. Although md is not a filesystem I think
> this also applies to md since md keeps some state information in RAM and some
> state information on disk. It is essential that all that state information is
> in consistent.
> 
> > If your patchset depends on this MD change, something should be wrong
> > in the following patches. Now I need to take a close look.
> 
> The later patches that make SCSI quiesce and resume safe do not depend on
> this change.

Are you sure?

If I remove the 1st patch, system suspend/resume will hang with all your
other 6 patchset.


-- 
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI
  2017-09-26 14:54     ` Ming Lei
@ 2017-09-26 20:17       ` Bart Van Assche
  2017-09-26 22:47         ` Ming Lei
  0 siblings, 1 reply; 48+ messages in thread
From: Bart Van Assche @ 2017-09-26 20:17 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, oleksandr, martin.petersen, axboe

T24gVHVlLCAyMDE3LTA5LTI2IGF0IDIyOjU0ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
VHVlLCBTZXAgMjYsIDIwMTcgYXQgMDI6Mjk6MDZQTSArMDAwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+IE9uIFR1ZSwgMjAxNy0wOS0yNiBhdCAxNzoxNSArMDgwMCwgTWluZyBMZWkgd3Jv
dGU6DQo+ID4gPiBObywgaWYgeW91IG9ubHkgYWRkcmVzcyBpc3N1ZSBvbiBNRCBkZXZpY2UsIHRo
YXQgaXMgZGVmaW5pdGVseSBub3QNCj4gPiA+IGFsdGVybmF0aXZlIG9mIG15IHBhdGNoc2V0Lg0K
PiA+IA0KPiA+IEEgY2xhcmlmaWNhdGlvbjogbXkgcGF0Y2ggc2VyaWVzIG5vdCBvbmx5IGZpeGVz
IHN1c3BlbmQgYW5kIHJlc3VtZSBmb3IgbWQtb24tU0NTSQ0KPiA+IGJ1dCBhbHNvIGZvciBhbGwg
b3RoZXIgc2NlbmFyaW8ncyBpbiB3aGljaCByZXN1bWUgbG9ja3MgdXAgZHVlIHRvIG5vIHRhZyBi
ZWluZw0KPiA+IGF2YWlsYWJsZSB3aGVuIHRoZSBTQ1NJIGNvcmUgdHJpZXMgdG8gZXhlY3V0ZSBh
IHBvd2VyIG1hbmFnZW1lbnQgY29tbWFuZC4+IA0KPiBJIGRvIGNhcmUgYWJvdXQgaWYgdGhpcyBw
YXRjaHNldCBjYW4gZml4IG5vbi1NRCBjYXNlcywgbGlrZQ0KPiBidHJmcy9yYWlkLCBvciB0cmFu
c3BvcnRfc3BpLCBib3RoIGFyZSByZWFsIHJlcG9ydHMuIENvdWxkIHlvdQ0KPiBtYWtlIHN1cmUg
aWYgeW91ciBwYXRjaHNldCBjYW4gY292ZXIgdGhpcyBub24tTUQgZGV2aWNlcz8NCg0KWWVzLCBJ
J20gc3VyZSB0aGlzIHBhdGNoIHNlcmllcyBjb3ZlcnMgbm9uLU1EIGRldmljZXMuDQoNCkl0IGlz
IGVhc3kgdG8gc2VlIHRoYXQgdGhpcyBwYXRjaCBzZXJpZXMgYWxzbyBjb3ZlcnMgU0NTSSBwYXJh
bGxlbCBkb21haW4NCnZhbGlkYXRpb24uDQoNClJlZ2FyZGluZyBCVFJGUyBSQUlEOiB5b3Ugc2hv
dWxkIGtub3cgdGhhdCBmaWxlc3lzdGVtcyBhcmUgcmVzcG9uc2libGUgZm9yDQppbXBsZW1lbnRp
bmcgZnJlZXplIC8gdGhhdyBzdXBwb3J0IHRoZW1zZWx2ZXMuIEkgdGhpbmsgYSBmaWxlc3lzdGVt
IG11c3QNCnN1cnJvdW5kIG1ldGFkYXRhIG9wZXJhdGlvbnMgYnkgc2Jfc3RhcnRfd3JpdGUoKSAv
IHNiX2VuZF93cml0ZSgpIGFuZCB0aGF0DQppdCBvcHRpb25hbGx5IGNhbiBwcm92aWRlIGZyZWV6
ZV9zdXBlciAvIHRoYXdfc3VwZXIgY2FsbGJhY2tzIGluDQpzdHJ1Y3Qgc3VwZXJfb3BlcmF0aW9u
cy4gSW4gdGhlIEJUUkZTIGNoYW5nZWxvZyB0aGVyZSBhcmUgc2V2ZXJhbCBmaXhlcw0Kd2l0aCBy
ZWdhcmQgdG8gZnJlZXplIC8gdGhhdyBzbyBJIHRoaW5rIHRoaXMgYXNwZWN0IG9mIEJUUkZTIGhh
cyBiZWVuIHRlc3RlZC4NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 5/7] scsi: Reduce suspend latency
  2017-09-25 20:29 ` [PATCH v4 5/7] scsi: Reduce suspend latency Bart Van Assche
@ 2017-09-26 22:23   ` Ming Lei
  2017-09-27  3:43     ` Bart Van Assche
  0 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2017-09-26 22:23 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Hannes Reinecke, Johannes Thumshirn

On Mon, Sep 25, 2017 at 01:29:22PM -0700, Bart Van Assche wrote:
> Avoid that it can take 200 ms too long to wait for requests to
> finish. Note: blk_mq_freeze_queue() uses a wait queue to wait
> for ongoing requests to finish.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  drivers/scsi/scsi_lib.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 62f905b22821..c261498606c4 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2927,6 +2927,7 @@ static void scsi_wait_for_queuecommand(struct scsi_device *sdev)
>  int
>  scsi_device_quiesce(struct scsi_device *sdev)
>  {
> +	struct request_queue *q = sdev->request_queue;
>  	int err;
>  
>  	mutex_lock(&sdev->state_mutex);
> @@ -2936,11 +2937,8 @@ scsi_device_quiesce(struct scsi_device *sdev)
>  	if (err)
>  		return err;
>  
> -	scsi_run_queue(sdev->request_queue);
> -	while (atomic_read(&sdev->device_busy)) {
> -		msleep_interruptible(200);
> -		scsi_run_queue(sdev->request_queue);
> -	}
> +	blk_mq_freeze_queue(q);
> +	blk_mq_unfreeze_queue(q);
>  	return 0;

I see the serious issue now with this patch, and it should have
been found much earlier, just because the setting quiesce isn't
shown in the patch. Sorry for finding it so late.

Once the patch is applied, scsi_device_quiesce() becomes
the following shape:

        mutex_lock(&sdev->state_mutex);
        err = scsi_device_set_state(sdev, SDEV_QUIESCE);
        if (err == 0)
                blk_set_preempt_only(q, true);
        mutex_unlock(&sdev->state_mutex);

        if (err)
                return err;

        blk_mq_freeze_queue(q);
        blk_mq_unfreeze_queue(q);

Any requests allocated before scsi_device_set_state() and
dispatched after scsi_device_set_state() can't be drained up
any more, then blk_mq_freeze_queue() will wait forever for the
drain up, so not only this patchset can't fix the current quiesce
issue, but also introduces new I/O hang in scsi_device_quiesce().

Now this approach looks broken fundamentally.

NAK.

-- 
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI
  2017-09-26 20:17       ` Bart Van Assche
@ 2017-09-26 22:47         ` Ming Lei
  0 siblings, 0 replies; 48+ messages in thread
From: Ming Lei @ 2017-09-26 22:47 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, linux-block, oleksandr, martin.petersen, axboe

On Tue, Sep 26, 2017 at 08:17:59PM +0000, Bart Van Assche wrote:
> On Tue, 2017-09-26 at 22:54 +0800, Ming Lei wrote:
> > On Tue, Sep 26, 2017 at 02:29:06PM +0000, Bart Van Assche wrote:
> > > On Tue, 2017-09-26 at 17:15 +0800, Ming Lei wrote:
> > > > No, if you only address issue on MD device, that is definitely not
> > > > alternative of my patchset.
> > > 
> > > A clarification: my patch series not only fixes suspend and resume for md-on-SCSI
> > > but also for all other scenario's in which resume locks up due to no tag being
> > > available when the SCSI core tries to execute a power management command.> 
> > I do care about if this patchset can fix non-MD cases, like
> > btrfs/raid, or transport_spi, both are real reports. Could you
> > make sure if your patchset can cover this non-MD devices?
> 
> Yes, I'm sure this patch series covers non-MD devices.
> 
> It is easy to see that this patch series also covers SCSI parallel domain
> validation.

Now I don't believe this patchset can do that, see my comment on your
patch 5:

https://www.mail-archive.com/linux-block@vger.kernel.org/msg13624.html

More worse, your patch introduces a new I/O hang inside SCSI quiesce.

> 
> Regarding BTRFS RAID: you should know that filesystems are responsible for
> implementing freeze / thaw support themselves. I think a filesystem must
> surround metadata operations by sb_start_write() / sb_end_write() and that
> it optionally can provide freeze_super / thaw_super callbacks in
> struct super_operations. In the BTRFS changelog there are several fixes
> with regard to freeze / thaw so I think this aspect of BTRFS has been tested.
> 
> Bart.

-- 
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
  2017-09-26 14:59       ` Ming Lei
@ 2017-09-27  3:12           ` Bart Van Assche
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Van Assche @ 2017-09-27  3:12 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, jthumshirn, linux-raid, hch, martin.petersen, axboe,
	oleksandr, hare, shli

On Tue, 2017-09-26 at 22:59 +0800, Ming Lei wrote:
> On Tue, Sep 26, 2017 at 02:42:07PM +0000, Bart Van Assche wrote:
> > On Tue, 2017-09-26 at 19:17 +0800, Ming Lei wrote:
> > > Just test this patch a bit and the following failure of freezing task
> > > is triggered during suspend: [ ... ]
> > 
> > What kernel version did you start from and which patches were applied on top of
> > that kernel? Only patch 1/7 or all seven patches? What storage configuration did
> 
> It is v4.14-rc1+, and top commit is 8d93c7a43157, with all your 7 patches
> applied.
> 
> > you use in your test and what command(s) did you use to trigger suspend?
> 
> Follows my pm test script:
> 
> 	#!/bin/sh
> 	
> 	echo check > /sys/block/md127/md/sync_action
> 	
> 	mkfs.ext4 -F /dev/md127
> 	
> 	mount /dev/md0 /mnt/data
> 	
> 	dd if=/dev/zero of=/mnt/data/d1.img bs=4k count=128k&
> 	
> 	echo 9 > /proc/sys/kernel/printk
> 	echo devices > /sys/power/pm_test
> 	echo mem > /sys/power/state
> 	
> 	wait
> 	umount /mnt/data
> 
> Storage setting:
> 
> 	sudo mdadm --create /dev/md/test /dev/sda /dev/sdb --level=1 --raid-devices=2
> 	both /dev/sda and /dev/sdb are virtio-scsi.

Thanks for the detailed reply. I have been able to reproduce the freeze failure
you reported. The output of SysRq-t learned me that the md reboot notifier was
waiting for the frozen md sync thread and that this caused the freeze failure. So
I have started testing the patch below instead of the patch at the start of this
e-mail thread:


Subject: [PATCH] md: Stop resync and reshape upon system freeze

Some people use the md driver on laptops and use the suspend and
resume functionality. Since it is essential that submitting of
new I/O requests stops before a hibernation image is created,
interrupt the md resync and reshape actions if the system is
being frozen. Note: the resync and reshape will restart after
the system is resumed and a message similar to the following
will appear in the system log:

md: md0: data-check interrupted.

---
 drivers/md/md.c | 34 ++++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 08fcaebc61bd..1e9d50f7345e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -66,6 +66,7 @@
 #include <linux/raid/md_u.h>
 #include <linux/slab.h>
 #include <linux/percpu-refcount.h>
+#include <linux/freezer.h>
 
 #include <trace/events/block.h>
 #include "md.h"
@@ -8103,6 +8104,12 @@ void md_allow_write(struct mddev *mddev)
 }
 EXPORT_SYMBOL_GPL(md_allow_write);
 
+static bool md_sync_interrupted(struct mddev *mddev)
+{
+	return test_bit(MD_RECOVERY_INTR, &mddev->recovery) ||
+		freezing(current);
+}
+
 #define SYNC_MARKS	10
 #define	SYNC_MARK_STEP	(3*HZ)
 #define UPDATE_FREQUENCY (5*60*HZ)
@@ -8133,6 +8140,8 @@ void md_do_sync(struct md_thread *thread)
 		return;
 	}
 
+	set_freezable();
+
 	if (mddev_is_clustered(mddev)) {
 		ret = md_cluster_ops->resync_start(mddev);
 		if (ret)
@@ -8184,7 +8193,7 @@ void md_do_sync(struct md_thread *thread)
 		mddev->curr_resync = 2;
 
 	try_again:
-		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+		if (md_sync_interrupted(mddev))
 			goto skip;
 		for_each_mddev(mddev2, tmp) {
 			if (mddev2 == mddev)
@@ -8208,7 +8217,7 @@ void md_do_sync(struct md_thread *thread)
 				 * be caught by 'softlockup'
 				 */
 				prepare_to_wait(&resync_wait, &wq, TASK_INTERRUPTIBLE);
-				if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
+				if (!md_sync_interrupted(mddev) &&
 				    mddev2->curr_resync >= mddev->curr_resync) {
 					if (mddev2_minor != mddev2->md_minor) {
 						mddev2_minor = mddev2->md_minor;
@@ -8335,8 +8344,7 @@ void md_do_sync(struct md_thread *thread)
 			sysfs_notify(&mddev->kobj, NULL, "sync_completed");
 		}
 
-		while (j >= mddev->resync_max &&
-		       !test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
+		while (j >= mddev->resync_max && !md_sync_interrupted(mddev)) {
 			/* As this condition is controlled by user-space,
 			 * we can block indefinitely, so use '_interruptible'
 			 * to avoid triggering warnings.
@@ -8348,7 +8356,7 @@ void md_do_sync(struct md_thread *thread)
 							     &mddev->recovery));
 		}
 
-		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+		if (md_sync_interrupted(mddev))
 			break;
 
 		sectors = mddev->pers->sync_request(mddev, j, &skipped);
@@ -8362,7 +8370,7 @@ void md_do_sync(struct md_thread *thread)
 			atomic_add(sectors, &mddev->recovery_active);
 		}
 
-		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+		if (md_sync_interrupted(mddev))
 			break;
 
 		j += sectors;
@@ -8394,7 +8402,7 @@ void md_do_sync(struct md_thread *thread)
 			last_mark = next;
 		}
 
-		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+		if (md_sync_interrupted(mddev))
 			break;
 
 		/*
@@ -8427,8 +8435,7 @@ void md_do_sync(struct md_thread *thread)
 		}
 	}
 	pr_info("md: %s: %s %s.\n",mdname(mddev), desc,
-		test_bit(MD_RECOVERY_INTR, &mddev->recovery)
-		? "interrupted" : "done");
+		md_sync_interrupted(mddev) ? "interrupted" : "done");
 	/*
 	 * this also signals 'finished resyncing' to md_stop
 	 */
@@ -8436,8 +8443,7 @@ void md_do_sync(struct md_thread *thread)
 	wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));
 
 	if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
-	    !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
-	    mddev->curr_resync > 3) {
+	    !md_sync_interrupted(mddev) && mddev->curr_resync > 3) {
 		mddev->curr_resync_completed = mddev->curr_resync;
 		sysfs_notify(&mddev->kobj, NULL, "sync_completed");
 	}
@@ -8446,7 +8452,7 @@ void md_do_sync(struct md_thread *thread)
 	if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery) &&
 	    mddev->curr_resync > 3) {
 		if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
-			if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
+			if (md_sync_interrupted(mddev)) {
 				if (mddev->curr_resync >= mddev->recovery_cp) {
 					pr_debug("md: checkpointing %s of %s.\n",
 						 desc, mdname(mddev));
@@ -8461,7 +8467,7 @@ void md_do_sync(struct md_thread *thread)
 			} else
 				mddev->recovery_cp = MaxSector;
 		} else {
-			if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+			if (!md_sync_interrupted(mddev))
 				mddev->curr_resync = MaxSector;
 			rcu_read_lock();
 			rdev_for_each_rcu(rdev, mddev)
@@ -8483,7 +8489,7 @@ void md_do_sync(struct md_thread *thread)
 		      BIT(MD_SB_CHANGE_PENDING) | BIT(MD_SB_CHANGE_DEVS));
 
 	spin_lock(&mddev->lock);
-	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
+	if (!md_sync_interrupted(mddev)) {
 		/* We completed so min/max setting can be forgotten if used. */
 		if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
 			mddev->resync_min = 0;

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
@ 2017-09-27  3:12           ` Bart Van Assche
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Van Assche @ 2017-09-27  3:12 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, jthumshirn, linux-raid, hch, martin.petersen, axboe,
	oleksandr, hare, shli

T24gVHVlLCAyMDE3LTA5LTI2IGF0IDIyOjU5ICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
VHVlLCBTZXAgMjYsIDIwMTcgYXQgMDI6NDI6MDdQTSArMDAwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+IE9uIFR1ZSwgMjAxNy0wOS0yNiBhdCAxOToxNyArMDgwMCwgTWluZyBMZWkgd3Jv
dGU6DQo+ID4gPiBKdXN0IHRlc3QgdGhpcyBwYXRjaCBhIGJpdCBhbmQgdGhlIGZvbGxvd2luZyBm
YWlsdXJlIG9mIGZyZWV6aW5nIHRhc2sNCj4gPiA+IGlzIHRyaWdnZXJlZCBkdXJpbmcgc3VzcGVu
ZDogWyAuLi4gXQ0KPiA+IA0KPiA+IFdoYXQga2VybmVsIHZlcnNpb24gZGlkIHlvdSBzdGFydCBm
cm9tIGFuZCB3aGljaCBwYXRjaGVzIHdlcmUgYXBwbGllZCBvbiB0b3Agb2YNCj4gPiB0aGF0IGtl
cm5lbD8gT25seSBwYXRjaCAxLzcgb3IgYWxsIHNldmVuIHBhdGNoZXM/IFdoYXQgc3RvcmFnZSBj
b25maWd1cmF0aW9uIGRpZA0KPiANCj4gSXQgaXMgdjQuMTQtcmMxKywgYW5kIHRvcCBjb21taXQg
aXMgOGQ5M2M3YTQzMTU3LCB3aXRoIGFsbCB5b3VyIDcgcGF0Y2hlcw0KPiBhcHBsaWVkLg0KPiAN
Cj4gPiB5b3UgdXNlIGluIHlvdXIgdGVzdCBhbmQgd2hhdCBjb21tYW5kKHMpIGRpZCB5b3UgdXNl
IHRvIHRyaWdnZXIgc3VzcGVuZD8NCj4gDQo+IEZvbGxvd3MgbXkgcG0gdGVzdCBzY3JpcHQ6DQo+
IA0KPiAJIyEvYmluL3NoDQo+IAkNCj4gCWVjaG8gY2hlY2sgPiAvc3lzL2Jsb2NrL21kMTI3L21k
L3N5bmNfYWN0aW9uDQo+IAkNCj4gCW1rZnMuZXh0NCAtRiAvZGV2L21kMTI3DQo+IAkNCj4gCW1v
dW50IC9kZXYvbWQwIC9tbnQvZGF0YQ0KPiAJDQo+IAlkZCBpZj0vZGV2L3plcm8gb2Y9L21udC9k
YXRhL2QxLmltZyBicz00ayBjb3VudD0xMjhrJg0KPiAJDQo+IAllY2hvIDkgPiAvcHJvYy9zeXMv
a2VybmVsL3ByaW50aw0KPiAJZWNobyBkZXZpY2VzID4gL3N5cy9wb3dlci9wbV90ZXN0DQo+IAll
Y2hvIG1lbSA+IC9zeXMvcG93ZXIvc3RhdGUNCj4gCQ0KPiAJd2FpdA0KPiAJdW1vdW50IC9tbnQv
ZGF0YQ0KPiANCj4gU3RvcmFnZSBzZXR0aW5nOg0KPiANCj4gCXN1ZG8gbWRhZG0gLS1jcmVhdGUg
L2Rldi9tZC90ZXN0IC9kZXYvc2RhIC9kZXYvc2RiIC0tbGV2ZWw9MSAtLXJhaWQtZGV2aWNlcz0y
DQo+IAlib3RoIC9kZXYvc2RhIGFuZCAvZGV2L3NkYiBhcmUgdmlydGlvLXNjc2kuDQoNClRoYW5r
cyBmb3IgdGhlIGRldGFpbGVkIHJlcGx5LiBJIGhhdmUgYmVlbiBhYmxlIHRvIHJlcHJvZHVjZSB0
aGUgZnJlZXplIGZhaWx1cmUNCnlvdSByZXBvcnRlZC4gVGhlIG91dHB1dCBvZiBTeXNScS10IGxl
YXJuZWQgbWUgdGhhdCB0aGUgbWQgcmVib290IG5vdGlmaWVyIHdhcw0Kd2FpdGluZyBmb3IgdGhl
IGZyb3plbiBtZCBzeW5jIHRocmVhZCBhbmQgdGhhdCB0aGlzIGNhdXNlZCB0aGUgZnJlZXplIGZh
aWx1cmUuIFNvDQpJIGhhdmUgc3RhcnRlZCB0ZXN0aW5nIHRoZSBwYXRjaCBiZWxvdyBpbnN0ZWFk
IG9mIHRoZSBwYXRjaCBhdCB0aGUgc3RhcnQgb2YgdGhpcw0KZS1tYWlsIHRocmVhZDoNCg0KDQpT
dWJqZWN0OiBbUEFUQ0hdIG1kOiBTdG9wIHJlc3luYyBhbmQgcmVzaGFwZSB1cG9uIHN5c3RlbSBm
cmVlemUNCg0KU29tZSBwZW9wbGUgdXNlIHRoZSBtZCBkcml2ZXIgb24gbGFwdG9wcyBhbmQgdXNl
IHRoZSBzdXNwZW5kIGFuZA0KcmVzdW1lIGZ1bmN0aW9uYWxpdHkuIFNpbmNlIGl0IGlzIGVzc2Vu
dGlhbCB0aGF0IHN1Ym1pdHRpbmcgb2YNCm5ldyBJL08gcmVxdWVzdHMgc3RvcHMgYmVmb3JlIGEg
aGliZXJuYXRpb24gaW1hZ2UgaXMgY3JlYXRlZCwNCmludGVycnVwdCB0aGUgbWQgcmVzeW5jIGFu
ZCByZXNoYXBlIGFjdGlvbnMgaWYgdGhlIHN5c3RlbSBpcw0KYmVpbmcgZnJvemVuLiBOb3RlOiB0
aGUgcmVzeW5jIGFuZCByZXNoYXBlIHdpbGwgcmVzdGFydCBhZnRlcg0KdGhlIHN5c3RlbSBpcyBy
ZXN1bWVkIGFuZCBhIG1lc3NhZ2Ugc2ltaWxhciB0byB0aGUgZm9sbG93aW5nDQp3aWxsIGFwcGVh
ciBpbiB0aGUgc3lzdGVtIGxvZzoNCg0KbWQ6IG1kMDogZGF0YS1jaGVjayBpbnRlcnJ1cHRlZC4N
Cg0KLS0tDQogZHJpdmVycy9tZC9tZC5jIHwgMzQgKysrKysrKysrKysrKysrKysrKystLS0tLS0t
LS0tLS0tLQ0KIDEgZmlsZSBjaGFuZ2VkLCAyMCBpbnNlcnRpb25zKCspLCAxNCBkZWxldGlvbnMo
LSkNCg0KZGlmZiAtLWdpdCBhL2RyaXZlcnMvbWQvbWQuYyBiL2RyaXZlcnMvbWQvbWQuYw0KaW5k
ZXggMDhmY2FlYmM2MWJkLi4xZTlkNTBmNzM0NWUgMTAwNjQ0DQotLS0gYS9kcml2ZXJzL21kL21k
LmMNCisrKyBiL2RyaXZlcnMvbWQvbWQuYw0KQEAgLTY2LDYgKzY2LDcgQEANCiAjaW5jbHVkZSA8
bGludXgvcmFpZC9tZF91Lmg+DQogI2luY2x1ZGUgPGxpbnV4L3NsYWIuaD4NCiAjaW5jbHVkZSA8
bGludXgvcGVyY3B1LXJlZmNvdW50Lmg+DQorI2luY2x1ZGUgPGxpbnV4L2ZyZWV6ZXIuaD4NCiAN
CiAjaW5jbHVkZSA8dHJhY2UvZXZlbnRzL2Jsb2NrLmg+DQogI2luY2x1ZGUgIm1kLmgiDQpAQCAt
ODEwMyw2ICs4MTA0LDEyIEBAIHZvaWQgbWRfYWxsb3dfd3JpdGUoc3RydWN0IG1kZGV2ICptZGRl
dikNCiB9DQogRVhQT1JUX1NZTUJPTF9HUEwobWRfYWxsb3dfd3JpdGUpOw0KIA0KK3N0YXRpYyBi
b29sIG1kX3N5bmNfaW50ZXJydXB0ZWQoc3RydWN0IG1kZGV2ICptZGRldikNCit7DQorCXJldHVy
biB0ZXN0X2JpdChNRF9SRUNPVkVSWV9JTlRSLCAmbWRkZXYtPnJlY292ZXJ5KSB8fA0KKwkJZnJl
ZXppbmcoY3VycmVudCk7DQorfQ0KKw0KICNkZWZpbmUgU1lOQ19NQVJLUwkxMA0KICNkZWZpbmUJ
U1lOQ19NQVJLX1NURVAJKDMqSFopDQogI2RlZmluZSBVUERBVEVfRlJFUVVFTkNZICg1KjYwKkha
KQ0KQEAgLTgxMzMsNiArODE0MCw4IEBAIHZvaWQgbWRfZG9fc3luYyhzdHJ1Y3QgbWRfdGhyZWFk
ICp0aHJlYWQpDQogCQlyZXR1cm47DQogCX0NCiANCisJc2V0X2ZyZWV6YWJsZSgpOw0KKw0KIAlp
ZiAobWRkZXZfaXNfY2x1c3RlcmVkKG1kZGV2KSkgew0KIAkJcmV0ID0gbWRfY2x1c3Rlcl9vcHMt
PnJlc3luY19zdGFydChtZGRldik7DQogCQlpZiAocmV0KQ0KQEAgLTgxODQsNyArODE5Myw3IEBA
IHZvaWQgbWRfZG9fc3luYyhzdHJ1Y3QgbWRfdGhyZWFkICp0aHJlYWQpDQogCQltZGRldi0+Y3Vy
cl9yZXN5bmMgPSAyOw0KIA0KIAl0cnlfYWdhaW46DQotCQlpZiAodGVzdF9iaXQoTURfUkVDT1ZF
UllfSU5UUiwgJm1kZGV2LT5yZWNvdmVyeSkpDQorCQlpZiAobWRfc3luY19pbnRlcnJ1cHRlZCht
ZGRldikpDQogCQkJZ290byBza2lwOw0KIAkJZm9yX2VhY2hfbWRkZXYobWRkZXYyLCB0bXApIHsN
CiAJCQlpZiAobWRkZXYyID09IG1kZGV2KQ0KQEAgLTgyMDgsNyArODIxNyw3IEBAIHZvaWQgbWRf
ZG9fc3luYyhzdHJ1Y3QgbWRfdGhyZWFkICp0aHJlYWQpDQogCQkJCSAqIGJlIGNhdWdodCBieSAn
c29mdGxvY2t1cCcNCiAJCQkJICovDQogCQkJCXByZXBhcmVfdG9fd2FpdCgmcmVzeW5jX3dhaXQs
ICZ3cSwgVEFTS19JTlRFUlJVUFRJQkxFKTsNCi0JCQkJaWYgKCF0ZXN0X2JpdChNRF9SRUNPVkVS
WV9JTlRSLCAmbWRkZXYtPnJlY292ZXJ5KSAmJg0KKwkJCQlpZiAoIW1kX3N5bmNfaW50ZXJydXB0
ZWQobWRkZXYpICYmDQogCQkJCSAgICBtZGRldjItPmN1cnJfcmVzeW5jID49IG1kZGV2LT5jdXJy
X3Jlc3luYykgew0KIAkJCQkJaWYgKG1kZGV2Ml9taW5vciAhPSBtZGRldjItPm1kX21pbm9yKSB7
DQogCQkJCQkJbWRkZXYyX21pbm9yID0gbWRkZXYyLT5tZF9taW5vcjsNCkBAIC04MzM1LDggKzgz
NDQsNyBAQCB2b2lkIG1kX2RvX3N5bmMoc3RydWN0IG1kX3RocmVhZCAqdGhyZWFkKQ0KIAkJCXN5
c2ZzX25vdGlmeSgmbWRkZXYtPmtvYmosIE5VTEwsICJzeW5jX2NvbXBsZXRlZCIpOw0KIAkJfQ0K
IA0KLQkJd2hpbGUgKGogPj0gbWRkZXYtPnJlc3luY19tYXggJiYNCi0JCSAgICAgICAhdGVzdF9i
aXQoTURfUkVDT1ZFUllfSU5UUiwgJm1kZGV2LT5yZWNvdmVyeSkpIHsNCisJCXdoaWxlIChqID49
IG1kZGV2LT5yZXN5bmNfbWF4ICYmICFtZF9zeW5jX2ludGVycnVwdGVkKG1kZGV2KSkgew0KIAkJ
CS8qIEFzIHRoaXMgY29uZGl0aW9uIGlzIGNvbnRyb2xsZWQgYnkgdXNlci1zcGFjZSwNCiAJCQkg
KiB3ZSBjYW4gYmxvY2sgaW5kZWZpbml0ZWx5LCBzbyB1c2UgJ19pbnRlcnJ1cHRpYmxlJw0KIAkJ
CSAqIHRvIGF2b2lkIHRyaWdnZXJpbmcgd2FybmluZ3MuDQpAQCAtODM0OCw3ICs4MzU2LDcgQEAg
dm9pZCBtZF9kb19zeW5jKHN0cnVjdCBtZF90aHJlYWQgKnRocmVhZCkNCiAJCQkJCQkJICAgICAm
bWRkZXYtPnJlY292ZXJ5KSk7DQogCQl9DQogDQotCQlpZiAodGVzdF9iaXQoTURfUkVDT1ZFUllf
SU5UUiwgJm1kZGV2LT5yZWNvdmVyeSkpDQorCQlpZiAobWRfc3luY19pbnRlcnJ1cHRlZChtZGRl
dikpDQogCQkJYnJlYWs7DQogDQogCQlzZWN0b3JzID0gbWRkZXYtPnBlcnMtPnN5bmNfcmVxdWVz
dChtZGRldiwgaiwgJnNraXBwZWQpOw0KQEAgLTgzNjIsNyArODM3MCw3IEBAIHZvaWQgbWRfZG9f
c3luYyhzdHJ1Y3QgbWRfdGhyZWFkICp0aHJlYWQpDQogCQkJYXRvbWljX2FkZChzZWN0b3JzLCAm
bWRkZXYtPnJlY292ZXJ5X2FjdGl2ZSk7DQogCQl9DQogDQotCQlpZiAodGVzdF9iaXQoTURfUkVD
T1ZFUllfSU5UUiwgJm1kZGV2LT5yZWNvdmVyeSkpDQorCQlpZiAobWRfc3luY19pbnRlcnJ1cHRl
ZChtZGRldikpDQogCQkJYnJlYWs7DQogDQogCQlqICs9IHNlY3RvcnM7DQpAQCAtODM5NCw3ICs4
NDAyLDcgQEAgdm9pZCBtZF9kb19zeW5jKHN0cnVjdCBtZF90aHJlYWQgKnRocmVhZCkNCiAJCQls
YXN0X21hcmsgPSBuZXh0Ow0KIAkJfQ0KIA0KLQkJaWYgKHRlc3RfYml0KE1EX1JFQ09WRVJZX0lO
VFIsICZtZGRldi0+cmVjb3ZlcnkpKQ0KKwkJaWYgKG1kX3N5bmNfaW50ZXJydXB0ZWQobWRkZXYp
KQ0KIAkJCWJyZWFrOw0KIA0KIAkJLyoNCkBAIC04NDI3LDggKzg0MzUsNyBAQCB2b2lkIG1kX2Rv
X3N5bmMoc3RydWN0IG1kX3RocmVhZCAqdGhyZWFkKQ0KIAkJfQ0KIAl9DQogCXByX2luZm8oIm1k
OiAlczogJXMgJXMuXG4iLG1kbmFtZShtZGRldiksIGRlc2MsDQotCQl0ZXN0X2JpdChNRF9SRUNP
VkVSWV9JTlRSLCAmbWRkZXYtPnJlY292ZXJ5KQ0KLQkJPyAiaW50ZXJydXB0ZWQiIDogImRvbmUi
KTsNCisJCW1kX3N5bmNfaW50ZXJydXB0ZWQobWRkZXYpID8gImludGVycnVwdGVkIiA6ICJkb25l
Iik7DQogCS8qDQogCSAqIHRoaXMgYWxzbyBzaWduYWxzICdmaW5pc2hlZCByZXN5bmNpbmcnIHRv
IG1kX3N0b3ANCiAJICovDQpAQCAtODQzNiw4ICs4NDQzLDcgQEAgdm9pZCBtZF9kb19zeW5jKHN0
cnVjdCBtZF90aHJlYWQgKnRocmVhZCkNCiAJd2FpdF9ldmVudChtZGRldi0+cmVjb3Zlcnlfd2Fp
dCwgIWF0b21pY19yZWFkKCZtZGRldi0+cmVjb3ZlcnlfYWN0aXZlKSk7DQogDQogCWlmICghdGVz
dF9iaXQoTURfUkVDT1ZFUllfUkVTSEFQRSwgJm1kZGV2LT5yZWNvdmVyeSkgJiYNCi0JICAgICF0
ZXN0X2JpdChNRF9SRUNPVkVSWV9JTlRSLCAmbWRkZXYtPnJlY292ZXJ5KSAmJg0KLQkgICAgbWRk
ZXYtPmN1cnJfcmVzeW5jID4gMykgew0KKwkgICAgIW1kX3N5bmNfaW50ZXJydXB0ZWQobWRkZXYp
ICYmIG1kZGV2LT5jdXJyX3Jlc3luYyA+IDMpIHsNCiAJCW1kZGV2LT5jdXJyX3Jlc3luY19jb21w
bGV0ZWQgPSBtZGRldi0+Y3Vycl9yZXN5bmM7DQogCQlzeXNmc19ub3RpZnkoJm1kZGV2LT5rb2Jq
LCBOVUxMLCAic3luY19jb21wbGV0ZWQiKTsNCiAJfQ0KQEAgLTg0NDYsNyArODQ1Miw3IEBAIHZv
aWQgbWRfZG9fc3luYyhzdHJ1Y3QgbWRfdGhyZWFkICp0aHJlYWQpDQogCWlmICghdGVzdF9iaXQo
TURfUkVDT1ZFUllfQ0hFQ0ssICZtZGRldi0+cmVjb3ZlcnkpICYmDQogCSAgICBtZGRldi0+Y3Vy
cl9yZXN5bmMgPiAzKSB7DQogCQlpZiAodGVzdF9iaXQoTURfUkVDT1ZFUllfU1lOQywgJm1kZGV2
LT5yZWNvdmVyeSkpIHsNCi0JCQlpZiAodGVzdF9iaXQoTURfUkVDT1ZFUllfSU5UUiwgJm1kZGV2
LT5yZWNvdmVyeSkpIHsNCisJCQlpZiAobWRfc3luY19pbnRlcnJ1cHRlZChtZGRldikpIHsNCiAJ
CQkJaWYgKG1kZGV2LT5jdXJyX3Jlc3luYyA+PSBtZGRldi0+cmVjb3ZlcnlfY3ApIHsNCiAJCQkJ
CXByX2RlYnVnKCJtZDogY2hlY2twb2ludGluZyAlcyBvZiAlcy5cbiIsDQogCQkJCQkJIGRlc2Ms
IG1kbmFtZShtZGRldikpOw0KQEAgLTg0NjEsNyArODQ2Nyw3IEBAIHZvaWQgbWRfZG9fc3luYyhz
dHJ1Y3QgbWRfdGhyZWFkICp0aHJlYWQpDQogCQkJfSBlbHNlDQogCQkJCW1kZGV2LT5yZWNvdmVy
eV9jcCA9IE1heFNlY3RvcjsNCiAJCX0gZWxzZSB7DQotCQkJaWYgKCF0ZXN0X2JpdChNRF9SRUNP
VkVSWV9JTlRSLCAmbWRkZXYtPnJlY292ZXJ5KSkNCisJCQlpZiAoIW1kX3N5bmNfaW50ZXJydXB0
ZWQobWRkZXYpKQ0KIAkJCQltZGRldi0+Y3Vycl9yZXN5bmMgPSBNYXhTZWN0b3I7DQogCQkJcmN1
X3JlYWRfbG9jaygpOw0KIAkJCXJkZXZfZm9yX2VhY2hfcmN1KHJkZXYsIG1kZGV2KQ0KQEAgLTg0
ODMsNyArODQ4OSw3IEBAIHZvaWQgbWRfZG9fc3luYyhzdHJ1Y3QgbWRfdGhyZWFkICp0aHJlYWQp
DQogCQkgICAgICBCSVQoTURfU0JfQ0hBTkdFX1BFTkRJTkcpIHwgQklUKE1EX1NCX0NIQU5HRV9E
RVZTKSk7DQogDQogCXNwaW5fbG9jaygmbWRkZXYtPmxvY2spOw0KLQlpZiAoIXRlc3RfYml0KE1E
X1JFQ09WRVJZX0lOVFIsICZtZGRldi0+cmVjb3ZlcnkpKSB7DQorCWlmICghbWRfc3luY19pbnRl
cnJ1cHRlZChtZGRldikpIHsNCiAJCS8qIFdlIGNvbXBsZXRlZCBzbyBtaW4vbWF4IHNldHRpbmcg
Y2FuIGJlIGZvcmdvdHRlbiBpZiB1c2VkLiAqLw0KIAkJaWYgKHRlc3RfYml0KE1EX1JFQ09WRVJZ
X1JFUVVFU1RFRCwgJm1kZGV2LT5yZWNvdmVyeSkpDQogCQkJbWRkZXYtPnJlc3luY19taW4gPSAw
Ow0K

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 5/7] scsi: Reduce suspend latency
  2017-09-26 22:23   ` Ming Lei
@ 2017-09-27  3:43     ` Bart Van Assche
  2017-09-27  5:37       ` Ming Lei
  0 siblings, 1 reply; 48+ messages in thread
From: Bart Van Assche @ 2017-09-27  3:43 UTC (permalink / raw)
  To: ming.lei
  Cc: hch, hare, jthumshirn, linux-block, oleksandr, martin.petersen, axboe

T24gV2VkLCAyMDE3LTA5LTI3IGF0IDA2OjIzICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gICAg
ICAgICBtdXRleF9sb2NrKCZzZGV2LT5zdGF0ZV9tdXRleCk7DQo+ICAgICAgICAgZXJyID0gc2Nz
aV9kZXZpY2Vfc2V0X3N0YXRlKHNkZXYsIFNERVZfUVVJRVNDRSk7DQo+ICAgICAgICAgaWYgKGVy
ciA9PSAwKQ0KPiAgICAgICAgICAgICAgICAgYmxrX3NldF9wcmVlbXB0X29ubHkocSwgdHJ1ZSk7
DQo+ICAgICAgICAgbXV0ZXhfdW5sb2NrKCZzZGV2LT5zdGF0ZV9tdXRleCk7DQo+IA0KPiAgICAg
ICAgIGlmIChlcnIpDQo+ICAgICAgICAgICAgICAgICByZXR1cm4gZXJyOw0KPiANCj4gICAgICAg
ICBibGtfbXFfZnJlZXplX3F1ZXVlKHEpOw0KPiAgICAgICAgIGJsa19tcV91bmZyZWV6ZV9xdWV1
ZShxKTsNCj4gDQo+IEFueSByZXF1ZXN0cyBhbGxvY2F0ZWQgYmVmb3JlIHNjc2lfZGV2aWNlX3Nl
dF9zdGF0ZSgpIGFuZA0KPiBkaXNwYXRjaGVkIGFmdGVyIHNjc2lfZGV2aWNlX3NldF9zdGF0ZSgp
IGNhbid0IGJlIGRyYWluZWQgdXANCj4gYW55IG1vcmUsIHRoZW4gYmxrX21xX2ZyZWV6ZV9xdWV1
ZSgpIHdpbGwgd2FpdCBmb3JldmVyIGZvciB0aGUNCj4gZHJhaW4gdXAsIHNvIG5vdCBvbmx5IHRo
aXMgcGF0Y2hzZXQgY2FuJ3QgZml4IHRoZSBjdXJyZW50IHF1aWVzY2UNCj4gaXNzdWUsIGJ1dCBh
bHNvIGludHJvZHVjZXMgbmV3IEkvTyBoYW5nIGluIHNjc2lfZGV2aWNlX3F1aWVzY2UoKS4NCg0K
VGhhdCdzIGEgZ29vZCBjYXRjaCwgYnV0IGZvcnR1bmF0ZWx5IHRoaXMgaXMgdmVyeSBlYXN5IHRv
IGZpeDogbW92ZSB0aGUNCmJsa19tcV9mcmVlemVfcXVldWUoKSBjYWxsIGJlZm9yZSB0aGUgbXV0
ZXhfbG9jaygpIGFuZCBzY3NpX2RldmljZV9zZXRfc3RhdGUoKQ0KY2FsbHMuIFRoZSBlcnJvciBw
YXRoIHdpbGwgYWxzbyBuZWVkIHNvbWUgYWRqdXN0bWVudC4NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 5/7] scsi: Reduce suspend latency
  2017-09-27  3:43     ` Bart Van Assche
@ 2017-09-27  5:37       ` Ming Lei
  0 siblings, 0 replies; 48+ messages in thread
From: Ming Lei @ 2017-09-27  5:37 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hch, hare, jthumshirn, linux-block, oleksandr, martin.petersen, axboe

On Wed, Sep 27, 2017 at 03:43:11AM +0000, Bart Van Assche wrote:
> On Wed, 2017-09-27 at 06:23 +0800, Ming Lei wrote:
> >         mutex_lock(&sdev->state_mutex);
> >         err = scsi_device_set_state(sdev, SDEV_QUIESCE);
> >         if (err == 0)
> >                 blk_set_preempt_only(q, true);
> >         mutex_unlock(&sdev->state_mutex);
> > 
> >         if (err)
> >                 return err;
> > 
> >         blk_mq_freeze_queue(q);
> >         blk_mq_unfreeze_queue(q);
> > 
> > Any requests allocated before scsi_device_set_state() and
> > dispatched after scsi_device_set_state() can't be drained up
> > any more, then blk_mq_freeze_queue() will wait forever for the
> > drain up, so not only this patchset can't fix the current quiesce
> > issue, but also introduces new I/O hang in scsi_device_quiesce().
> 
> That's a good catch, but fortunately this is very easy to fix: move the
> blk_mq_freeze_queue() call before the mutex_lock() and scsi_device_set_state()
> calls. The error path will also need some adjustment.

I am also working towards that direction, patches have been ready.

-- 
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
  2017-09-27  3:12           ` Bart Van Assche
  (?)
@ 2017-09-27 11:00           ` Ming Lei
  2017-10-02 15:39               ` Bart Van Assche
  -1 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2017-09-27 11:00 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, jthumshirn, linux-raid, hch, martin.petersen, axboe,
	oleksandr, hare, shli

On Wed, Sep 27, 2017 at 03:12:47AM +0000, Bart Van Assche wrote:
> On Tue, 2017-09-26 at 22:59 +0800, Ming Lei wrote:
> > On Tue, Sep 26, 2017 at 02:42:07PM +0000, Bart Van Assche wrote:
> > > On Tue, 2017-09-26 at 19:17 +0800, Ming Lei wrote:
> > > > Just test this patch a bit and the following failure of freezing task
> > > > is triggered during suspend: [ ... ]
> > > 
> > > What kernel version did you start from and which patches were applied on top of
> > > that kernel? Only patch 1/7 or all seven patches? What storage configuration did
> > 
> > It is v4.14-rc1+, and top commit is 8d93c7a43157, with all your 7 patches
> > applied.
> > 
> > > you use in your test and what command(s) did you use to trigger suspend?
> > 
> > Follows my pm test script:
> > 
> > 	#!/bin/sh
> > 	
> > 	echo check > /sys/block/md127/md/sync_action
> > 	
> > 	mkfs.ext4 -F /dev/md127
> > 	
> > 	mount /dev/md0 /mnt/data
> > 	
> > 	dd if=/dev/zero of=/mnt/data/d1.img bs=4k count=128k&
> > 	
> > 	echo 9 > /proc/sys/kernel/printk
> > 	echo devices > /sys/power/pm_test
> > 	echo mem > /sys/power/state
> > 	
> > 	wait
> > 	umount /mnt/data
> > 
> > Storage setting:
> > 
> > 	sudo mdadm --create /dev/md/test /dev/sda /dev/sdb --level=1 --raid-devices=2
> > 	both /dev/sda and /dev/sdb are virtio-scsi.
> 
> Thanks for the detailed reply. I have been able to reproduce the freeze failure
> you reported. The output of SysRq-t learned me that the md reboot notifier was
> waiting for the frozen md sync thread and that this caused the freeze failure. So
> I have started testing the patch below instead of the patch at the start of this
> e-mail thread:

OK, thanks for figuring out the reason.

With current linus tree, SCSI I/O is prevented from being dispatch to
device during suspend by SCSI quiesce, and will be dispatched again
in resume. With Safe SCSI quiesce[1], any underlying IO request will
stop at blk_queue_enter() during suspend, and restart from there during
resume.

For other non-SCSI driver, their .suspend/.resume often
handles I/O in similar way, for example, NVMe queue will
be frozen in .suspend, and unfreeze in .resume.

So could you explain a bit which kind of bug this patch fixes?

[1] https://marc.info/?l=linux-block&m=150649136519281&w=2


-- 
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
  2017-09-25 20:29 ` [PATCH v4 1/7] md: Make md resync and reshape threads freezable Bart Van Assche
                     ` (2 preceding siblings ...)
  2017-09-26 11:17   ` Ming Lei
@ 2017-10-02 13:26   ` Christoph Hellwig
  3 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2017-10-02 13:26 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Shaohua Li, linux-raid, Ming Lei,
	Hannes Reinecke, Johannes Thumshirn

Independent of whether this should be required to make scsi quience
safe or not making the dm thread freeze aware is the right thing
to do, and this patch looks correct to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 2/7] block: Make q_usage_counter also track legacy requests
  2017-09-25 20:29 ` [PATCH v4 2/7] block: Make q_usage_counter also track legacy requests Bart Van Assche
  2017-09-26  6:05   ` Hannes Reinecke
  2017-09-26  8:34   ` Ming Lei
@ 2017-10-02 13:29   ` Christoph Hellwig
  2 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2017-10-02 13:29 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 3/7] block: Convert RQF_PREEMPT into REQ_PREEMPT
  2017-09-25 20:29 ` [PATCH v4 3/7] block: Convert RQF_PREEMPT into REQ_PREEMPT Bart Van Assche
@ 2017-10-02 13:42   ` Christoph Hellwig
  2017-10-02 13:43     ` Christoph Hellwig
  0 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2017-10-02 13:42 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn

This looks ok to me, or at least better than the version from Ming to
archive the same.  I kinda hate to add more REQ_* flags than
really nessecary though.  Maybe instead of the mapping to REQ_* as
suggested to ming blk_queue_enter should instead take the BLK_MQ_REQ_*
flags and we'll add BLK_MQ_REQ_PREEMPT, and pass those to
blk_get_request.  If it wasn't for Linus beeing angry due to block
churn I'd be very tempted to suggest that.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 3/7] block: Convert RQF_PREEMPT into REQ_PREEMPT
  2017-10-02 13:42   ` Christoph Hellwig
@ 2017-10-02 13:43     ` Christoph Hellwig
  0 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2017-10-02 13:43 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn

On Mon, Oct 02, 2017 at 03:42:44PM +0200, Christoph Hellwig wrote:
> This looks ok to me, or at least better than the version from Ming to
> archive the same.  I kinda hate to add more REQ_* flags than
> really nessecary though.  Maybe instead of the mapping to REQ_* as
> suggested to ming blk_queue_enter should instead take the BLK_MQ_REQ_*
> flags and we'll add BLK_MQ_REQ_PREEMPT, and pass those to
> blk_get_request.  If it wasn't for Linus beeing angry due to block
> churn I'd be very tempted to suggest that.

Actually, based on the next patch in the series from Ming we could
add a blk_get_request_flags() or similar that takes the BLK_MQ_REQ_*
flags, which would be my preferred method.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 4/7] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  2017-09-25 20:29 ` [PATCH v4 4/7] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag Bart Van Assche
@ 2017-10-02 13:47   ` Christoph Hellwig
  2017-10-02 15:56     ` Bart Van Assche
  0 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2017-10-02 13:47 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn

> +void blk_set_preempt_only(struct request_queue *q, bool preempt_only)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(q->queue_lock, flags);
> +	if (preempt_only)
> +		queue_flag_set(QUEUE_FLAG_PREEMPT_ONLY, q);
> +	else
> +		queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
> +	spin_unlock_irqrestore(q->queue_lock, flags);
> +}
> +EXPORT_SYMBOL(blk_set_preempt_only);

Why do we even need this helper?  The lock doesn't make sense to me,
and it would just much easier to set/clear the flag from the driver.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 6/7] scsi: Set QUEUE_FLAG_PREEMPT_ONLY while quiesced
  2017-09-25 20:29 ` [PATCH v4 6/7] scsi: Set QUEUE_FLAG_PREEMPT_ONLY while quiesced Bart Van Assche
@ 2017-10-02 13:53   ` Christoph Hellwig
  0 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2017-10-02 13:53 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	=Oleksandr Natalenko, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn

On Mon, Sep 25, 2017 at 01:29:23PM -0700, Bart Van Assche wrote:
> Make the quiesce state visible to the block layer for the next
> patch in this series.

This looks like the more correct version as everything is under
state_mutex.

Except for the fact that I'd like to get rid of the blk_set_preempt_only
helper this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
  2017-09-27 11:00           ` Ming Lei
@ 2017-10-02 15:39               ` Bart Van Assche
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Van Assche @ 2017-10-02 15:39 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, jthumshirn, linux-raid, hch, martin.petersen, axboe,
	oleksandr, hare, shli

On Wed, 2017-09-27 at 19:00 +0800, Ming Lei wrote:
> With current linus tree, SCSI I/O is prevented from being dispatch to
> device during suspend by SCSI quiesce, and will be dispatched again
> in resume. With Safe SCSI quiesce[1], any underlying IO request will
> stop at blk_queue_enter() during suspend, and restart from there during
> resume.
> 
> For other non-SCSI driver, their .suspend/.resume often
> handles I/O in similar way, for example, NVMe queue will
> be frozen in .suspend, and unfreeze in .resume.
> 
> So could you explain a bit which kind of bug this patch fixes?

It seems like you do not fully understand the motivation behind quiescing and
resuming I/O from inside the SCSI and NVMe freeze and thaw power management
callbacks. Hibernation can only work reliably if no I/O is submitted after
creation of the hibernation image has been started. If any I/O is submitted
that is not related to the hibernation process itself by any driver after
user space processes and kernel threads have been frozen then that's a *bug*
in the component that submitted the I/O. The freezing and suspending of I/O
from inside the SCSI and NVMe drivers is a *workaround* for these bugs but is
not a full solution. Before the hibernation image is written to disk I/O is
resumed and the I/O requests that got deferred will be executed at that time.
In other words, suspending and resuming I/O from inside the SCSI and NVMe
drivers is a workaround and not a full solution. The following quote from
Documentation/power/swsusp.txt illustrates this:

 * BIG FAT WARNING *********************************************************
 *
 * If you touch anything on disk between suspend and resume...
 *				...kiss your data goodbye.

Bart.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
@ 2017-10-02 15:39               ` Bart Van Assche
  0 siblings, 0 replies; 48+ messages in thread
From: Bart Van Assche @ 2017-10-02 15:39 UTC (permalink / raw)
  To: ming.lei
  Cc: linux-block, jthumshirn, linux-raid, hch, martin.petersen, axboe,
	oleksandr, hare, shli

T24gV2VkLCAyMDE3LTA5LTI3IGF0IDE5OjAwICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gV2l0
aCBjdXJyZW50IGxpbnVzIHRyZWUsIFNDU0kgSS9PIGlzIHByZXZlbnRlZCBmcm9tIGJlaW5nIGRp
c3BhdGNoIHRvDQo+IGRldmljZSBkdXJpbmcgc3VzcGVuZCBieSBTQ1NJIHF1aWVzY2UsIGFuZCB3
aWxsIGJlIGRpc3BhdGNoZWQgYWdhaW4NCj4gaW4gcmVzdW1lLiBXaXRoIFNhZmUgU0NTSSBxdWll
c2NlWzFdLCBhbnkgdW5kZXJseWluZyBJTyByZXF1ZXN0IHdpbGwNCj4gc3RvcCBhdCBibGtfcXVl
dWVfZW50ZXIoKSBkdXJpbmcgc3VzcGVuZCwgYW5kIHJlc3RhcnQgZnJvbSB0aGVyZSBkdXJpbmcN
Cj4gcmVzdW1lLg0KPiANCj4gRm9yIG90aGVyIG5vbi1TQ1NJIGRyaXZlciwgdGhlaXIgLnN1c3Bl
bmQvLnJlc3VtZSBvZnRlbg0KPiBoYW5kbGVzIEkvTyBpbiBzaW1pbGFyIHdheSwgZm9yIGV4YW1w
bGUsIE5WTWUgcXVldWUgd2lsbA0KPiBiZSBmcm96ZW4gaW4gLnN1c3BlbmQsIGFuZCB1bmZyZWV6
ZSBpbiAucmVzdW1lLg0KPiANCj4gU28gY291bGQgeW91IGV4cGxhaW4gYSBiaXQgd2hpY2gga2lu
ZCBvZiBidWcgdGhpcyBwYXRjaCBmaXhlcz8NCg0KSXQgc2VlbXMgbGlrZSB5b3UgZG8gbm90IGZ1
bGx5IHVuZGVyc3RhbmQgdGhlIG1vdGl2YXRpb24gYmVoaW5kIHF1aWVzY2luZyBhbmQNCnJlc3Vt
aW5nIEkvTyBmcm9tIGluc2lkZSB0aGUgU0NTSSBhbmQgTlZNZSBmcmVlemUgYW5kIHRoYXcgcG93
ZXIgbWFuYWdlbWVudA0KY2FsbGJhY2tzLiBIaWJlcm5hdGlvbiBjYW4gb25seSB3b3JrIHJlbGlh
Ymx5IGlmIG5vIEkvTyBpcyBzdWJtaXR0ZWQgYWZ0ZXINCmNyZWF0aW9uIG9mIHRoZSBoaWJlcm5h
dGlvbiBpbWFnZSBoYXMgYmVlbiBzdGFydGVkLiBJZiBhbnkgSS9PIGlzIHN1Ym1pdHRlZA0KdGhh
dCBpcyBub3QgcmVsYXRlZCB0byB0aGUgaGliZXJuYXRpb24gcHJvY2VzcyBpdHNlbGYgYnkgYW55
IGRyaXZlciBhZnRlcg0KdXNlciBzcGFjZSBwcm9jZXNzZXMgYW5kIGtlcm5lbCB0aHJlYWRzIGhh
dmUgYmVlbiBmcm96ZW4gdGhlbiB0aGF0J3MgYSAqYnVnKg0KaW4gdGhlIGNvbXBvbmVudCB0aGF0
IHN1Ym1pdHRlZCB0aGUgSS9PLiBUaGUgZnJlZXppbmcgYW5kIHN1c3BlbmRpbmcgb2YgSS9PDQpm
cm9tIGluc2lkZSB0aGUgU0NTSSBhbmQgTlZNZSBkcml2ZXJzIGlzIGEgKndvcmthcm91bmQqIGZv
ciB0aGVzZSBidWdzIGJ1dCBpcw0Kbm90IGEgZnVsbCBzb2x1dGlvbi4gQmVmb3JlIHRoZSBoaWJl
cm5hdGlvbiBpbWFnZSBpcyB3cml0dGVuIHRvIGRpc2sgSS9PIGlzDQpyZXN1bWVkIGFuZCB0aGUg
SS9PIHJlcXVlc3RzIHRoYXQgZ290IGRlZmVycmVkIHdpbGwgYmUgZXhlY3V0ZWQgYXQgdGhhdCB0
aW1lLg0KSW4gb3RoZXIgd29yZHMsIHN1c3BlbmRpbmcgYW5kIHJlc3VtaW5nIEkvTyBmcm9tIGlu
c2lkZSB0aGUgU0NTSSBhbmQgTlZNZQ0KZHJpdmVycyBpcyBhIHdvcmthcm91bmQgYW5kIG5vdCBh
IGZ1bGwgc29sdXRpb24uIFRoZSBmb2xsb3dpbmcgcXVvdGUgZnJvbQ0KRG9jdW1lbnRhdGlvbi9w
b3dlci9zd3N1c3AudHh0IGlsbHVzdHJhdGVzIHRoaXM6DQoNCiAqIEJJRyBGQVQgV0FSTklORyAq
KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioN
CiAqDQogKiBJZiB5b3UgdG91Y2ggYW55dGhpbmcgb24gZGlzayBiZXR3ZWVuIHN1c3BlbmQgYW5k
IHJlc3VtZS4uLg0KICoJCQkJLi4ua2lzcyB5b3VyIGRhdGEgZ29vZGJ5ZS4NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 4/7] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  2017-10-02 13:47   ` Christoph Hellwig
@ 2017-10-02 15:56     ` Bart Van Assche
  2017-10-02 16:14       ` hch
  0 siblings, 1 reply; 48+ messages in thread
From: Bart Van Assche @ 2017-10-02 15:56 UTC (permalink / raw)
  To: hch
  Cc: hare, jthumshirn, linux-block, oleksandr, martin.petersen, axboe,
	ming.lei

T24gTW9uLCAyMDE3LTEwLTAyIGF0IDE1OjQ3ICswMjAwLCBDaHJpc3RvcGggSGVsbHdpZyB3cm90
ZToNCj4gPiArdm9pZCBibGtfc2V0X3ByZWVtcHRfb25seShzdHJ1Y3QgcmVxdWVzdF9xdWV1ZSAq
cSwgYm9vbCBwcmVlbXB0X29ubHkpDQo+ID4gK3sNCj4gPiArCXVuc2lnbmVkIGxvbmcgZmxhZ3M7
DQo+ID4gKw0KPiA+ICsJc3Bpbl9sb2NrX2lycXNhdmUocS0+cXVldWVfbG9jaywgZmxhZ3MpOw0K
PiA+ICsJaWYgKHByZWVtcHRfb25seSkNCj4gPiArCQlxdWV1ZV9mbGFnX3NldChRVUVVRV9GTEFH
X1BSRUVNUFRfT05MWSwgcSk7DQo+ID4gKwllbHNlDQo+ID4gKwkJcXVldWVfZmxhZ19jbGVhcihR
VUVVRV9GTEFHX1BSRUVNUFRfT05MWSwgcSk7DQo+ID4gKwlzcGluX3VubG9ja19pcnFyZXN0b3Jl
KHEtPnF1ZXVlX2xvY2ssIGZsYWdzKTsNCj4gPiArfQ0KPiA+ICtFWFBPUlRfU1lNQk9MKGJsa19z
ZXRfcHJlZW1wdF9vbmx5KTsNCj4gDQo+IFdoeSBkbyB3ZSBldmVuIG5lZWQgdGhpcyBoZWxwZXI/
ICBUaGUgbG9jayBkb2Vzbid0IG1ha2Ugc2Vuc2UgdG8gbWUsDQo+IGFuZCBpdCB3b3VsZCBqdXN0
IG11Y2ggZWFzaWVyIHRvIHNldC9jbGVhciB0aGUgZmxhZyBmcm9tIHRoZSBkcml2ZXIuDQoNCkhl
bGxvIENocmlzdG9waCwNCg0KU2luY2UgcXVldWVfZmxhZ19zZXQoKSBhbmQgcXVldWVfZmxhZ19j
bGVhcigpIHVzZSBub24tYXRvbWljIHByaW1pdGl2ZXMgdG8NCnNldCBhbmQgY2xlYXIgZmxhZ3Mg
aXQgaXMgZXNzZW50aWFsIHRvIHByb3RlY3QgY2FsbHMgb2YgdGhlc2UgZnVuY3Rpb25zIHdpdGgN
CnRoZSBxdWV1ZSBsb2NrLiBPdGhlcndpc2UgZmxhZyB1cGRhdGVzIGNvdWxkIGdldCBsb3N0IGR1
ZSB0byBjb25jdXJyZW50DQpxdWV1ZV9mbGFnX3NldCgpIG9yIHF1ZXVlX2ZsYWdfY2xlYXIoKSBj
YWxscy4NCg0KT25lIG9mIHRoZSByZWFzb25zIHdoeSBJIGludHJvZHVjZWQgdGhpcyBoZWxwZXIg
ZnVuY3Rpb24gaXMgdG8ga2VlcCB0aGUNCndha2VfdXBfYWxsKCZxLT5tcV9mcmVlemVfd3EpIGNh
bGwgdGhhdCBpcyBhZGRlZCB0byB0aGlzIGZ1bmN0aW9uIGJ5IGEgbGF0ZXINCnBhdGNoIGluIHRo
ZSBibG9jayBsYXllci4NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 4/7] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  2017-10-02 15:56     ` Bart Van Assche
@ 2017-10-02 16:14       ` hch
  0 siblings, 0 replies; 48+ messages in thread
From: hch @ 2017-10-02 16:14 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hch, hare, jthumshirn, linux-block, oleksandr, martin.petersen,
	axboe, ming.lei

On Mon, Oct 02, 2017 at 03:56:08PM +0000, Bart Van Assche wrote:
> Since queue_flag_set() and queue_flag_clear() use non-atomic primitives to
> set and clear flags it is essential to protect calls of these functions with
> the queue lock. Otherwise flag updates could get lost due to concurrent
> queue_flag_set() or queue_flag_clear() calls.
> 
> One of the reasons why I introduced this helper function is to keep the
> wake_up_all(&q->mq_freeze_wq) call that is added to this function by a later
> patch in the block layer.

Oh, despite the _unlocked versions existing it doesn't do any atomic
ops.  Yeah, we'll need the queue lock then.

But please split it into one helper for setting the queue blocked
and to clear it, especially if the clear needs to do an additional
wake_up.

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2017-10-02 16:14 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-25 20:29 [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI Bart Van Assche
2017-09-25 20:29 ` [PATCH v4 1/7] md: Make md resync and reshape threads freezable Bart Van Assche
2017-09-25 23:04   ` Ming Lei
2017-09-25 23:09     ` Bart Van Assche
2017-09-25 23:09       ` Bart Van Assche
2017-09-26  4:01       ` Ming Lei
2017-09-26  8:13         ` Ming Lei
2017-09-26 14:40           ` Bart Van Assche
2017-09-26 14:40             ` Bart Van Assche
2017-09-26 15:02             ` Ming Lei
2017-09-26  6:06   ` Hannes Reinecke
2017-09-26 11:17   ` Ming Lei
2017-09-26 14:42     ` Bart Van Assche
2017-09-26 14:42       ` Bart Van Assche
2017-09-26 14:59       ` Ming Lei
2017-09-27  3:12         ` Bart Van Assche
2017-09-27  3:12           ` Bart Van Assche
2017-09-27 11:00           ` Ming Lei
2017-10-02 15:39             ` Bart Van Assche
2017-10-02 15:39               ` Bart Van Assche
2017-10-02 13:26   ` Christoph Hellwig
2017-09-25 20:29 ` [PATCH v4 2/7] block: Make q_usage_counter also track legacy requests Bart Van Assche
2017-09-26  6:05   ` Hannes Reinecke
2017-09-26  8:34   ` Ming Lei
2017-10-02 13:29   ` Christoph Hellwig
2017-09-25 20:29 ` [PATCH v4 3/7] block: Convert RQF_PREEMPT into REQ_PREEMPT Bart Van Assche
2017-10-02 13:42   ` Christoph Hellwig
2017-10-02 13:43     ` Christoph Hellwig
2017-09-25 20:29 ` [PATCH v4 4/7] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag Bart Van Assche
2017-10-02 13:47   ` Christoph Hellwig
2017-10-02 15:56     ` Bart Van Assche
2017-10-02 16:14       ` hch
2017-09-25 20:29 ` [PATCH v4 5/7] scsi: Reduce suspend latency Bart Van Assche
2017-09-26 22:23   ` Ming Lei
2017-09-27  3:43     ` Bart Van Assche
2017-09-27  5:37       ` Ming Lei
2017-09-25 20:29 ` [PATCH v4 6/7] scsi: Set QUEUE_FLAG_PREEMPT_ONLY while quiesced Bart Van Assche
2017-10-02 13:53   ` Christoph Hellwig
2017-09-25 20:29 ` [PATCH v4 7/7] block: Make SCSI device suspend and resume work reliably Bart Van Assche
2017-09-25 22:59   ` Ming Lei
2017-09-25 23:13     ` Bart Van Assche
2017-09-26  8:32       ` Ming Lei
2017-09-26 14:44         ` Bart Van Assche
2017-09-26  9:15 ` [PATCH v4 0/7] Make suspend and resume safe for md-on-SCSI Ming Lei
2017-09-26 14:29   ` Bart Van Assche
2017-09-26 14:54     ` Ming Lei
2017-09-26 20:17       ` Bart Van Assche
2017-09-26 22:47         ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.