All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 00/17] Improve performance for zoned UFS devices
@ 2023-08-16 19:53 Bart Van Assche
  2023-08-16 19:53 ` [PATCH v9 01/17] block: Introduce more member variables related to zone write locking Bart Van Assche
                   ` (16 more replies)
  0 siblings, 17 replies; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche

Hi Jens,

This patch series improves small write IOPS by a factor of four (+300%) for
zoned UFS devices on my test setup with an UFSHCI 3.0 controller. Please
consider this patch series for the next merge window.

Thank you,

Bart.

Changes compared to v8:
 - Fixed handling of 'driver_preserves_write_order' and 'use_zone_write_lock'
   in blk_stack_limits().
 - Added a comment in disk_set_zoned().
 - Modified blk_req_needs_zone_write_lock() such that it returns false if
   q->limits.use_zone_write_lock is false.
 - Modified disk_clear_zone_settings() such that it clears
   q->limits.use_zone_write_lock.
 - Left out one change from the mq-deadline patch that became superfluous due to
   the blk_req_needs_zone_write_lock() change.
 - Modified scsi_call_prepare_resubmit() such that it only calls list_sort() if
   zoned writes have to be resubmitted for which zone write locking is disabled.
 - Added an additional unit test for scsi_call_prepare_resubmit().
 - Modified the sorting code in the sd driver such that only those SCSI commands
   are sorted for which write locking is disabled.
 - Modified sd_zbc.c such that ELEVATOR_F_ZBD_SEQ_WRITE is only set if the
   write order is not preserved.
 - Included three patches for UFS host drivers that rework code that wrote
   directly to the auto-hibernation controller register.
 - Modified the UFS driver such that enabling auto-hibernation is not allowed
   if a zoned logical unit is present and if the controller operates in legacy
   mode.
 - Also in the UFS driver, simplified ufshcd_auto_hibern8_update().

Changes compared to v7:
 - Split the queue_limits member variable `use_zone_write_lock' into two member
   variables: `use_zone_write_lock' (set by disk_set_zoned()) and
   `driver_preserves_write_order' (set by the block driver or SCSI LLD). This
   should clear up the confusion about the purpose of this variable.
 - Moved the code for sorting SCSI commands by LBA from the SCSI error handler
   into the SCSI disk (sd) driver as requested by Christoph.
   
Changes compared to v6:
 - Removed QUEUE_FLAG_NO_ZONE_WRITE_LOCK and instead introduced a flag in
   the request queue limits data structure.

Changes compared to v5:
 - Renamed scsi_cmp_lba() into scsi_cmp_sector().
 - Improved several source code comments.

Changes compared to v4:
 - Dropped the patch that introduces the REQ_NO_ZONE_WRITE_LOCK flag.
 - Dropped the null_blk patch and added two scsi_debug patches instead.
 - Dropped the f2fs patch.
 - Split the patch for the UFS driver into two patches.
 - Modified several patch descriptions and source code comments.
 - Renamed dd_use_write_locking() into dd_use_zone_write_locking().
 - Moved the list_sort() call from scsi_unjam_host() into scsi_eh_flush_done_q()
   such that sorting happens just before reinserting.
 - Removed the scsi_cmd_retry_allowed() call from scsi_check_sense() to make
   sure that the retry counter is adjusted once per retry instead of twice.

Changes compared to v3:
 - Restored the patch that introduces QUEUE_FLAG_NO_ZONE_WRITE_LOCK. That patch
   had accidentally been left out from v2.
 - In patch "block: Introduce the flag REQ_NO_ZONE_WRITE_LOCK", improved the
   patch description and added the function blk_no_zone_write_lock().
 - In patch "block/mq-deadline: Only use zone locking if necessary", moved the
   blk_queue_is_zoned() call into dd_use_write_locking().
 - In patch "fs/f2fs: Disable zone write locking", set REQ_NO_ZONE_WRITE_LOCK
   from inside __bio_alloc() instead of in f2fs_submit_write_bio().

Changes compared to v2:
 - Renamed the request queue flag for disabling zone write locking.
 - Introduced a new request flag for disabling zone write locking.
 - Modified the mq-deadline scheduler such that zone write locking is only
   disabled if both flags are set.
 - Added an F2FS patch that sets the request flag for disabling zone write
   locking.
 - Only disable zone write locking in the UFS driver if auto-hibernation is
   disabled.

Changes compared to v1:
 - Left out the patches that are already upstream.
 - Switched the approach in patch "scsi: Retry unaligned zoned writes" from
   retrying immediately to sending unaligned write commands to the SCSI error
   handler.

Bart Van Assche (17):
  block: Introduce more member variables related to zone write locking
  block: Only use write locking if necessary
  block/mq-deadline: Only use zone locking if necessary
  scsi: core: Call .eh_prepare_resubmit() before resubmitting
  scsi: core: Add unit tests for scsi_call_prepare_resubmit()
  scsi: sd: Sort commands by LBA before resubmitting
  scsi: core: Retry unaligned zoned writes
  scsi: sd_zbc: Only require an I/O scheduler if needed
  scsi: scsi_debug: Support disabling zone write locking
  scsi: scsi_debug: Support injecting unaligned write errors
  scsi: ufs: hisi: Rework the code that disables auto-hibernation
  scsi: ufs: mediatek: Rework the code for disabling auto-hibernation
  scsi: ufs: sprd: Rework the code for disabling auto-hibernation
  scsi: ufs: Rename ufshcd_auto_hibern8_enable() and make it static
  scsi: ufs: Simplify ufshcd_auto_hibern8_update()
  scsi: ufs: Forbid auto-hibernation without I/O scheduler
  scsi: ufs: Inform the block layer about write ordering

 block/blk-settings.c            |  15 +++
 block/blk-zoned.c               |  10 +-
 block/mq-deadline.c             |  11 +-
 drivers/scsi/Kconfig            |   2 +
 drivers/scsi/Kconfig.kunit      |   4 +
 drivers/scsi/Makefile           |   2 +
 drivers/scsi/Makefile.kunit     |   1 +
 drivers/scsi/scsi_debug.c       |  20 +++-
 drivers/scsi/scsi_error.c       |  81 +++++++++++++
 drivers/scsi/scsi_error_test.c  | 196 ++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c         |   1 +
 drivers/scsi/scsi_priv.h        |   1 +
 drivers/scsi/sd.c               |  41 +++++++
 drivers/scsi/sd_zbc.c           |   4 +-
 drivers/ufs/core/ufs-sysfs.c    |   2 +-
 drivers/ufs/core/ufshcd-priv.h  |   1 -
 drivers/ufs/core/ufshcd.c       | 110 ++++++++++++++----
 drivers/ufs/host/ufs-hisi.c     |   5 +-
 drivers/ufs/host/ufs-mediatek.c |   2 +-
 drivers/ufs/host/ufs-sprd.c     |  11 +-
 include/linux/blkdev.h          |  10 ++
 include/scsi/scsi.h             |   1 +
 include/scsi/scsi_driver.h      |   1 +
 include/ufs/ufshcd.h            |   3 +-
 24 files changed, 485 insertions(+), 50 deletions(-)
 create mode 100644 drivers/scsi/Kconfig.kunit
 create mode 100644 drivers/scsi/Makefile.kunit
 create mode 100644 drivers/scsi/scsi_error_test.c


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v9 01/17] block: Introduce more member variables related to zone write locking
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-17 11:00   ` Damien Le Moal
  2023-08-16 19:53 ` [PATCH v9 02/17] block: Only use write locking if necessary Bart Van Assche
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Damien Le Moal, Ming Lei

Many but not all storage controllers require serialization of zoned writes.
Introduce two new request queue limit member variables related to write
serialization. 'driver_preserves_write_order' allows block drivers to
indicate that the order of write commands is preserved and hence that
serialization of writes per zone is not required. 'use_zone_write_lock' is
set by disk_set_zoned() if and only if the block device has zones and if
the block driver does not preserve the order of write requests.

Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-settings.c   | 15 +++++++++++++++
 block/blk-zoned.c      |  1 +
 include/linux/blkdev.h | 10 ++++++++++
 3 files changed, 26 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 0046b447268f..4c776c08f190 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -56,6 +56,8 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->alignment_offset = 0;
 	lim->io_opt = 0;
 	lim->misaligned = 0;
+	lim->driver_preserves_write_order = false;
+	lim->use_zone_write_lock = false;
 	lim->zoned = BLK_ZONED_NONE;
 	lim->zone_write_granularity = 0;
 	lim->dma_alignment = 511;
@@ -82,6 +84,8 @@ void blk_set_stacking_limits(struct queue_limits *lim)
 	lim->max_dev_sectors = UINT_MAX;
 	lim->max_write_zeroes_sectors = UINT_MAX;
 	lim->max_zone_append_sectors = UINT_MAX;
+	/* Request-based stacking drivers do not reorder requests. */
+	lim->driver_preserves_write_order = true;
 }
 EXPORT_SYMBOL(blk_set_stacking_limits);
 
@@ -685,6 +689,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 						   b->max_secure_erase_sectors);
 	t->zone_write_granularity = max(t->zone_write_granularity,
 					b->zone_write_granularity);
+	t->driver_preserves_write_order = t->driver_preserves_write_order &&
+		b->driver_preserves_write_order;
+	t->use_zone_write_lock = t->use_zone_write_lock ||
+		b->use_zone_write_lock;
 	t->zoned = max(t->zoned, b->zoned);
 	return ret;
 }
@@ -949,6 +957,13 @@ void disk_set_zoned(struct gendisk *disk, enum blk_zoned_model model)
 	}
 
 	q->limits.zoned = model;
+	/*
+	 * Use the zone write lock only for zoned block devices and only if
+	 * the block driver does not preserve the order of write commands.
+	 */
+	q->limits.use_zone_write_lock = model != BLK_ZONED_NONE &&
+		!q->limits.driver_preserves_write_order;
+
 	if (model != BLK_ZONED_NONE) {
 		/*
 		 * Set the zone write granularity to the device logical block
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 619ee41a51cc..112620985bff 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -631,6 +631,7 @@ void disk_clear_zone_settings(struct gendisk *disk)
 	q->limits.chunk_sectors = 0;
 	q->limits.zone_write_granularity = 0;
 	q->limits.max_zone_append_sectors = 0;
+	q->limits.use_zone_write_lock = false;
 
 	blk_mq_unfreeze_queue(q);
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4feed1fc141f..e6d5450c8488 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -316,6 +316,16 @@ struct queue_limits {
 	unsigned char		misaligned;
 	unsigned char		discard_misaligned;
 	unsigned char		raid_partial_stripes_expensive;
+	/*
+	 * Whether or not the block driver preserves the order of write
+	 * requests. Set by the block driver.
+	 */
+	bool			driver_preserves_write_order;
+	/*
+	 * Whether or not zone write locking should be used. Set by
+	 * disk_set_zoned().
+	 */
+	bool			use_zone_write_lock;
 	enum blk_zoned_model	zoned;
 
 	/*

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 02/17] block: Only use write locking if necessary
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
  2023-08-16 19:53 ` [PATCH v9 01/17] block: Introduce more member variables related to zone write locking Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-17 11:01   ` Damien Le Moal
  2023-08-16 19:53 ` [PATCH v9 03/17] block/mq-deadline: Only use zone " Bart Van Assche
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Damien Le Moal, Ming Lei

Make blk_req_needs_zone_write_lock() return false if
q->limits.use_zone_write_lock is false. Inline this function because it
is short and because it is called from the hot path of the mq-deadline
I/O scheduler.

Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-zoned.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 112620985bff..d8a80cce832f 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -53,11 +53,16 @@ const char *blk_zone_cond_str(enum blk_zone_cond zone_cond)
 EXPORT_SYMBOL_GPL(blk_zone_cond_str);
 
 /*
- * Return true if a request is a write requests that needs zone write locking.
+ * Return true if a request is a write request that needs zone write locking.
  */
 bool blk_req_needs_zone_write_lock(struct request *rq)
 {
-	if (!rq->q->disk->seq_zones_wlock)
+	struct request_queue *q = rq->q;
+
+	if (!q->limits.use_zone_write_lock)
+		return false;
+
+	if (!q->disk->seq_zones_wlock)
 		return false;
 
 	return blk_rq_is_seq_zoned_write(rq);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 03/17] block/mq-deadline: Only use zone locking if necessary
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
  2023-08-16 19:53 ` [PATCH v9 01/17] block: Introduce more member variables related to zone write locking Bart Van Assche
  2023-08-16 19:53 ` [PATCH v9 02/17] block: Only use write locking if necessary Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-17 11:02   ` Damien Le Moal
  2023-08-16 19:53 ` [PATCH v9 04/17] scsi: core: Call .eh_prepare_resubmit() before resubmitting Bart Van Assche
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Damien Le Moal, Ming Lei

Measurements have shown that limiting the queue depth to one per zone for
zoned writes has a significant negative performance impact on zoned UFS
devices. Hence this patch that disables zone locking by the mq-deadline
scheduler if the storage controller preserves the command order. This
patch is based on the following assumptions:
- It happens infrequently that zoned write requests are reordered by the
  block layer.
- The I/O priority of all write requests is the same per zone.
- Either no I/O scheduler is used or an I/O scheduler is used that
  serializes write requests per zone.

Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index f958e79277b8..082ccf3186f4 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -353,7 +353,7 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 		return NULL;
 
 	rq = rq_entry_fifo(per_prio->fifo_list[data_dir].next);
-	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
+	if (data_dir == DD_READ || !rq->q->limits.use_zone_write_lock)
 		return rq;
 
 	/*
@@ -398,7 +398,7 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 	if (!rq)
 		return NULL;
 
-	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
+	if (data_dir == DD_READ || !rq->q->limits.use_zone_write_lock)
 		return rq;
 
 	/*
@@ -526,8 +526,9 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
 	}
 
 	/*
-	 * For a zoned block device, if we only have writes queued and none of
-	 * them can be dispatched, rq will be NULL.
+	 * For a zoned block device that requires write serialization, if we
+	 * only have writes queued and none of them can be dispatched, rq will
+	 * be NULL.
 	 */
 	if (!rq)
 		return NULL;
@@ -934,7 +935,7 @@ static void dd_finish_request(struct request *rq)
 
 	atomic_inc(&per_prio->stats.completed);
 
-	if (blk_queue_is_zoned(q)) {
+	if (rq->q->limits.use_zone_write_lock) {
 		unsigned long flags;
 
 		spin_lock_irqsave(&dd->zone_lock, flags);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 04/17] scsi: core: Call .eh_prepare_resubmit() before resubmitting
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
                   ` (2 preceding siblings ...)
  2023-08-16 19:53 ` [PATCH v9 03/17] block/mq-deadline: Only use zone " Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-17 11:10   ` Damien Le Moal
  2023-08-16 19:53 ` [PATCH v9 05/17] scsi: core: Add unit tests for scsi_call_prepare_resubmit() Bart Van Assche
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Damien Le Moal, Ming Lei, James E.J. Bottomley

Introduce the .eh_prepare_resubmit function pointer in struct scsi_driver.
Make the error handler call .eh_prepare_resubmit() before resubmitting
commands. A later patch will use this functionality to sort SCSI commands
by LBA from inside the SCSI disk driver.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_error.c  | 65 ++++++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_priv.h   |  1 +
 include/scsi/scsi_driver.h |  1 +
 3 files changed, 67 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c67cdcdc3ba8..4393a7fd8a07 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -27,6 +27,7 @@
 #include <linux/blkdev.h>
 #include <linux/delay.h>
 #include <linux/jiffies.h>
+#include <linux/list_sort.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -2186,6 +2187,68 @@ void scsi_eh_ready_devs(struct Scsi_Host *shost,
 }
 EXPORT_SYMBOL_GPL(scsi_eh_ready_devs);
 
+/*
+ * Returns true if the commands in @done_q should be sorted in LBA order
+ * before being resubmitted.
+ */
+static bool scsi_needs_sorting(struct list_head *done_q)
+{
+	struct scsi_cmnd *scmd;
+
+	list_for_each_entry(scmd, done_q, eh_entry) {
+		struct request *rq = scsi_cmd_to_rq(scmd);
+
+		if (!rq->q->limits.use_zone_write_lock &&
+		    blk_rq_is_seq_zoned_write(rq))
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Comparison function that allows to sort SCSI commands by ULD driver.
+ */
+static int scsi_cmp_uld(void *priv, const struct list_head *_a,
+			const struct list_head *_b)
+{
+	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
+	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
+
+	/* See also the comment above the list_sort() definition. */
+	return scsi_cmd_to_driver(a) > scsi_cmd_to_driver(b);
+}
+
+void scsi_call_prepare_resubmit(struct list_head *done_q)
+{
+	struct scsi_cmnd *scmd, *next;
+
+	if (!scsi_needs_sorting(done_q))
+		return;
+
+	/* Sort pending SCSI commands by ULD. */
+	list_sort(NULL, done_q, scsi_cmp_uld);
+
+	/*
+	 * Call .eh_prepare_resubmit for each range of commands with identical
+	 * ULD driver pointer.
+	 */
+	list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
+		struct scsi_driver *uld = scsi_cmd_to_driver(scmd);
+		struct list_head *prev, uld_cmd_list;
+
+		while (&next->eh_entry != done_q &&
+		       scsi_cmd_to_driver(next) == uld)
+			next = list_next_entry(next, eh_entry);
+		if (!uld->eh_prepare_resubmit)
+			continue;
+		prev = scmd->eh_entry.prev;
+		list_cut_position(&uld_cmd_list, prev, next->eh_entry.prev);
+		uld->eh_prepare_resubmit(&uld_cmd_list);
+		list_splice(&uld_cmd_list, prev);
+	}
+}
+
 /**
  * scsi_eh_flush_done_q - finish processed commands or retry them.
  * @done_q:	list_head of processed commands.
@@ -2194,6 +2257,8 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
 {
 	struct scsi_cmnd *scmd, *next;
 
+	scsi_call_prepare_resubmit(done_q);
+
 	list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
 		list_del_init(&scmd->eh_entry);
 		if (scsi_device_online(scmd->device) &&
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index f42388ecb024..df4af4645430 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -101,6 +101,7 @@ int scsi_eh_get_sense(struct list_head *work_q,
 		      struct list_head *done_q);
 bool scsi_noretry_cmd(struct scsi_cmnd *scmd);
 void scsi_eh_done(struct scsi_cmnd *scmd);
+void scsi_call_prepare_resubmit(struct list_head *done_q);
 
 /* scsi_lib.c */
 extern int scsi_maybe_unblock_host(struct scsi_device *sdev);
diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h
index 4ce1988b2ba0..2b11be896eee 100644
--- a/include/scsi/scsi_driver.h
+++ b/include/scsi/scsi_driver.h
@@ -18,6 +18,7 @@ struct scsi_driver {
 	int (*done)(struct scsi_cmnd *);
 	int (*eh_action)(struct scsi_cmnd *, int);
 	void (*eh_reset)(struct scsi_cmnd *);
+	void (*eh_prepare_resubmit)(struct list_head *cmd_list);
 };
 #define to_scsi_driver(drv) \
 	container_of((drv), struct scsi_driver, gendrv)

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 05/17] scsi: core: Add unit tests for scsi_call_prepare_resubmit()
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
                   ` (3 preceding siblings ...)
  2023-08-16 19:53 ` [PATCH v9 04/17] scsi: core: Call .eh_prepare_resubmit() before resubmitting Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-16 19:53 ` [PATCH v9 06/17] scsi: sd: Sort commands by LBA before resubmitting Bart Van Assche
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Damien Le Moal, Ming Lei, James E.J. Bottomley

Triggering all code paths in scsi_call_prepare_resubmit() via manual
testing is difficult. Hence add unit tests for this function.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/Kconfig           |   2 +
 drivers/scsi/Kconfig.kunit     |   4 +
 drivers/scsi/Makefile          |   2 +
 drivers/scsi/Makefile.kunit    |   1 +
 drivers/scsi/scsi_error_test.c | 196 +++++++++++++++++++++++++++++++++
 5 files changed, 205 insertions(+)
 create mode 100644 drivers/scsi/Kconfig.kunit
 create mode 100644 drivers/scsi/Makefile.kunit
 create mode 100644 drivers/scsi/scsi_error_test.c

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 4962ce989113..fc288f8fb800 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -232,6 +232,8 @@ config SCSI_SCAN_ASYNC
 	  Note that this setting also affects whether resuming from
 	  system suspend will be performed asynchronously.
 
+source "drivers/scsi/Kconfig.kunit"
+
 menu "SCSI Transports"
 	depends on SCSI
 
diff --git a/drivers/scsi/Kconfig.kunit b/drivers/scsi/Kconfig.kunit
new file mode 100644
index 000000000000..90984a6ec7cc
--- /dev/null
+++ b/drivers/scsi/Kconfig.kunit
@@ -0,0 +1,4 @@
+config SCSI_ERROR_TEST
+	tristate "scsi_error.c unit tests" if !KUNIT_ALL_TESTS
+	depends on SCSI && KUNIT
+	default KUNIT_ALL_TESTS
diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index f055bfd54a68..1c5c3afb6c6e 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -168,6 +168,8 @@ scsi_mod-$(CONFIG_PM)		+= scsi_pm.o
 scsi_mod-$(CONFIG_SCSI_DH)	+= scsi_dh.o
 scsi_mod-$(CONFIG_BLK_DEV_BSG)	+= scsi_bsg.o
 
+include $(srctree)/drivers/scsi/Makefile.kunit
+
 hv_storvsc-y			:= storvsc_drv.o
 
 sd_mod-objs	:= sd.o
diff --git a/drivers/scsi/Makefile.kunit b/drivers/scsi/Makefile.kunit
new file mode 100644
index 000000000000..3e98053b2709
--- /dev/null
+++ b/drivers/scsi/Makefile.kunit
@@ -0,0 +1 @@
+obj-$(CONFIG_SCSI_ERROR_TEST) += scsi_error_test.o
diff --git a/drivers/scsi/scsi_error_test.c b/drivers/scsi/scsi_error_test.c
new file mode 100644
index 000000000000..c35ac628065e
--- /dev/null
+++ b/drivers/scsi/scsi_error_test.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2023 Google LLC
+ */
+#include <kunit/test.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_driver.h>
+#include "scsi_priv.h"
+
+static struct kunit *kunit_test;
+
+static void uld_prepare_resubmit(struct list_head *cmd_list)
+{
+	/* This function must not be called. */
+	KUNIT_EXPECT_TRUE(kunit_test, false);
+}
+
+/*
+ * Verify that .eh_prepare_resubmit() is not called if use_zone_write_lock is
+ * true.
+ */
+static void test_prepare_resubmit1(struct kunit *test)
+{
+	static struct gendisk disk;
+	static struct request_queue q = {
+		.disk = &disk,
+		.limits = {
+			.driver_preserves_write_order = false,
+			.use_zone_write_lock = true,
+			.zoned = BLK_ZONED_HM,
+		}
+	};
+	static struct scsi_driver uld = {
+		.eh_prepare_resubmit = uld_prepare_resubmit,
+	};
+	static struct scsi_device dev = {
+		.request_queue = &q,
+		.sdev_gendev.driver = &uld.gendrv,
+	};
+	static struct rq_and_cmd {
+		struct request rq;
+		struct scsi_cmnd cmd;
+	} cmd1, cmd2;
+	LIST_HEAD(cmd_list);
+
+	BUILD_BUG_ON(scsi_cmd_to_rq(&cmd1.cmd) != &cmd1.rq);
+
+	disk.queue = &q;
+	cmd1 = (struct rq_and_cmd){
+		.rq = {
+			.q = &q,
+			.cmd_flags = REQ_OP_WRITE,
+			.__sector = 2,
+		},
+		.cmd.device = &dev,
+	};
+	cmd2 = cmd1;
+	cmd2.rq.__sector = 1;
+	list_add_tail(&cmd1.cmd.eh_entry, &cmd_list);
+	list_add_tail(&cmd2.cmd.eh_entry, &cmd_list);
+
+	KUNIT_EXPECT_EQ(test, list_count_nodes(&cmd_list), 2);
+	kunit_test = test;
+	scsi_call_prepare_resubmit(&cmd_list);
+	kunit_test = NULL;
+	KUNIT_EXPECT_EQ(test, list_count_nodes(&cmd_list), 2);
+	KUNIT_EXPECT_PTR_EQ(test, cmd_list.next, &cmd1.cmd.eh_entry);
+	KUNIT_EXPECT_PTR_EQ(test, cmd_list.next->next, &cmd2.cmd.eh_entry);
+}
+
+static struct scsi_driver *uld1, *uld2, *uld3;
+
+static void uld1_prepare_resubmit(struct list_head *cmd_list)
+{
+	struct scsi_cmnd *cmd;
+
+	KUNIT_EXPECT_EQ(kunit_test, list_count_nodes(cmd_list), 2);
+	list_for_each_entry(cmd, cmd_list, eh_entry)
+		KUNIT_EXPECT_PTR_EQ(kunit_test, scsi_cmd_to_driver(cmd), uld1);
+}
+
+static void uld2_prepare_resubmit(struct list_head *cmd_list)
+{
+	struct scsi_cmnd *cmd;
+
+	KUNIT_EXPECT_EQ(kunit_test, list_count_nodes(cmd_list), 2);
+	list_for_each_entry(cmd, cmd_list, eh_entry)
+		KUNIT_EXPECT_PTR_EQ(kunit_test, scsi_cmd_to_driver(cmd), uld2);
+}
+
+static void test_prepare_resubmit2(struct kunit *test)
+{
+	static struct gendisk disk;
+	static struct request_queue q = {
+		.disk = &disk,
+		.limits = {
+			.driver_preserves_write_order = true,
+			.use_zone_write_lock = false,
+			.zoned = BLK_ZONED_HM,
+		}
+	};
+	static struct rq_and_cmd {
+		struct request rq;
+		struct scsi_cmnd cmd;
+	} cmd1, cmd2, cmd3, cmd4, cmd5, cmd6;
+	static struct scsi_device dev1, dev2, dev3;
+	struct scsi_driver *uld;
+	LIST_HEAD(cmd_list);
+
+	BUILD_BUG_ON(scsi_cmd_to_rq(&cmd1.cmd) != &cmd1.rq);
+
+	uld = kzalloc(3 * sizeof(uld), GFP_KERNEL);
+	uld1 = &uld[0];
+	uld1->eh_prepare_resubmit = uld1_prepare_resubmit;
+	uld2 = &uld[1];
+	uld2->eh_prepare_resubmit = uld2_prepare_resubmit;
+	uld3 = &uld[2];
+	disk.queue = &q;
+	dev1.sdev_gendev.driver = &uld1->gendrv;
+	dev1.request_queue = &q;
+	dev2.sdev_gendev.driver = &uld2->gendrv;
+	dev2.request_queue = &q;
+	dev3.sdev_gendev.driver = &uld3->gendrv;
+	dev3.request_queue = &q;
+	cmd1 = (struct rq_and_cmd){
+		.rq = {
+			.q = &q,
+			.cmd_flags = REQ_OP_WRITE,
+			.__sector = 3,
+		},
+		.cmd.device = &dev1,
+	};
+	cmd2 = cmd1;
+	cmd2.rq.__sector = 4;
+	cmd3 = (struct rq_and_cmd){
+		.rq = {
+			.q = &q,
+			.cmd_flags = REQ_OP_WRITE,
+			.__sector = 1,
+		},
+		.cmd.device = &dev2,
+	};
+	cmd4 = cmd3;
+	cmd4.rq.__sector = 2,
+	cmd5 = (struct rq_and_cmd){
+		.rq = {
+			.q = &q,
+			.cmd_flags = REQ_OP_WRITE,
+			.__sector = 5,
+		},
+		.cmd.device = &dev3,
+	};
+	cmd6 = cmd5;
+	cmd6.rq.__sector = 6;
+	list_add_tail(&cmd3.cmd.eh_entry, &cmd_list);
+	list_add_tail(&cmd1.cmd.eh_entry, &cmd_list);
+	list_add_tail(&cmd2.cmd.eh_entry, &cmd_list);
+	list_add_tail(&cmd5.cmd.eh_entry, &cmd_list);
+	list_add_tail(&cmd6.cmd.eh_entry, &cmd_list);
+	list_add_tail(&cmd4.cmd.eh_entry, &cmd_list);
+
+	KUNIT_EXPECT_EQ(test, list_count_nodes(&cmd_list), 6);
+	kunit_test = test;
+	scsi_call_prepare_resubmit(&cmd_list);
+	kunit_test = NULL;
+	KUNIT_EXPECT_EQ(test, list_count_nodes(&cmd_list), 6);
+	KUNIT_EXPECT_TRUE(test, uld1 < uld2);
+	KUNIT_EXPECT_TRUE(test, uld2 < uld3);
+	KUNIT_EXPECT_PTR_EQ(test, cmd_list.next, &cmd1.cmd.eh_entry);
+	KUNIT_EXPECT_PTR_EQ(test, cmd_list.next->next, &cmd2.cmd.eh_entry);
+	KUNIT_EXPECT_PTR_EQ(test, cmd_list.next->next->next,
+			    &cmd3.cmd.eh_entry);
+	KUNIT_EXPECT_PTR_EQ(test, cmd_list.next->next->next->next,
+			    &cmd4.cmd.eh_entry);
+	KUNIT_EXPECT_PTR_EQ(test, cmd_list.next->next->next->next->next,
+			    &cmd5.cmd.eh_entry);
+	KUNIT_EXPECT_PTR_EQ(test, cmd_list.next->next->next->next->next->next,
+			    &cmd6.cmd.eh_entry);
+	kfree(uld);
+}
+
+static struct kunit_case prepare_resubmit_test_cases[] = {
+	KUNIT_CASE(test_prepare_resubmit1),
+	KUNIT_CASE(test_prepare_resubmit2),
+	{}
+};
+
+static struct kunit_suite prepare_resubmit_test_suite = {
+	.name = "prepare_resubmit",
+	.test_cases = prepare_resubmit_test_cases,
+};
+kunit_test_suite(prepare_resubmit_test_suite);
+
+MODULE_DESCRIPTION("scsi_call_prepare_resubmit() unit tests");
+MODULE_AUTHOR("Bart Van Assche");
+MODULE_LICENSE("GPL");

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 06/17] scsi: sd: Sort commands by LBA before resubmitting
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
                   ` (4 preceding siblings ...)
  2023-08-16 19:53 ` [PATCH v9 05/17] scsi: core: Add unit tests for scsi_call_prepare_resubmit() Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-17 11:13   ` Damien Le Moal
  2023-08-16 19:53 ` [PATCH v9 07/17] scsi: core: Retry unaligned zoned writes Bart Van Assche
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Damien Le Moal, Ming Lei, James E.J. Bottomley

Sort SCSI commands by LBA before the SCSI error handler resubmits
these commands. This is necessary when resubmitting zoned writes
(REQ_OP_WRITE) if multiple writes have been queued for a single zone.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/sd.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 3c668cfb146d..8a4b0874e7fe 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -47,6 +47,7 @@
 #include <linux/blkpg.h>
 #include <linux/blk-pm.h>
 #include <linux/delay.h>
+#include <linux/list_sort.h>
 #include <linux/major.h>
 #include <linux/mutex.h>
 #include <linux/string_helpers.h>
@@ -117,6 +118,7 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt);
 static int sd_done(struct scsi_cmnd *);
 static void sd_eh_reset(struct scsi_cmnd *);
 static int sd_eh_action(struct scsi_cmnd *, int);
+static void sd_prepare_resubmit(struct list_head *cmd_list);
 static void sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer);
 static void scsi_disk_release(struct device *cdev);
 
@@ -617,6 +619,7 @@ static struct scsi_driver sd_template = {
 	.done			= sd_done,
 	.eh_action		= sd_eh_action,
 	.eh_reset		= sd_eh_reset,
+	.eh_prepare_resubmit	= sd_prepare_resubmit,
 };
 
 /*
@@ -2018,6 +2021,38 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
 	return eh_disp;
 }
 
+static int sd_cmp_sector(void *priv, const struct list_head *_a,
+			 const struct list_head *_b)
+{
+	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
+	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
+	struct request *rq_a = scsi_cmd_to_rq(a);
+	struct request *rq_b = scsi_cmd_to_rq(b);
+	bool use_zwl_a = rq_a->q->limits.use_zone_write_lock;
+	bool use_zwl_b = rq_b->q->limits.use_zone_write_lock;
+
+	/*
+	 * Order the commands that need zone write locking after the commands
+	 * that do not need zone write locking. Order the commands that do not
+	 * need zone write locking by LBA. Do not reorder the commands that
+	 * need zone write locking. See also the comment above the list_sort()
+	 * definition.
+	 */
+	if (use_zwl_a || use_zwl_b)
+		return use_zwl_a > use_zwl_b;
+	return blk_rq_pos(rq_a) > blk_rq_pos(rq_b);
+}
+
+static void sd_prepare_resubmit(struct list_head *cmd_list)
+{
+	/*
+	 * Sort pending SCSI commands in starting sector order. This is
+	 * important if one of the SCSI devices associated with @shost is a
+	 * zoned block device for which zone write locking is disabled.
+	 */
+	list_sort(NULL, cmd_list, sd_cmp_sector);
+}
+
 static unsigned int sd_completed_bytes(struct scsi_cmnd *scmd)
 {
 	struct request *req = scsi_cmd_to_rq(scmd);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 07/17] scsi: core: Retry unaligned zoned writes
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
                   ` (5 preceding siblings ...)
  2023-08-16 19:53 ` [PATCH v9 06/17] scsi: sd: Sort commands by LBA before resubmitting Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-17 11:21   ` Damien Le Moal
  2023-08-16 19:53 ` [PATCH v9 08/17] scsi: sd_zbc: Only require an I/O scheduler if needed Bart Van Assche
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Damien Le Moal, Ming Lei, James E.J. Bottomley

If zoned writes (REQ_OP_WRITE) for a sequential write required zone have
a starting LBA that differs from the write pointer, e.g. because zoned
writes have been reordered, then the storage device will respond with an
UNALIGNED WRITE COMMAND error. Send commands that failed with an
unaligned write error to the SCSI error handler if zone write locking is
disabled. The SCSI error handler will sort SCSI commands per LBA before
resubmitting these.

If zone write locking is disabled, increase the number of retries for
write commands sent to a sequential zone to the maximum number of
outstanding commands because in the worst case the number of times
reordered zoned writes have to be retried is (number of outstanding
writes per sequential zone) - 1.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_error.c | 16 ++++++++++++++++
 drivers/scsi/scsi_lib.c   |  1 +
 drivers/scsi/sd.c         |  6 ++++++
 include/scsi/scsi.h       |  1 +
 4 files changed, 24 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 4393a7fd8a07..69510e99ccfd 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -699,6 +699,22 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
 		fallthrough;
 
 	case ILLEGAL_REQUEST:
+		/*
+		 * Unaligned write command. This may indicate that zoned writes
+		 * have been received by the device in the wrong order. If zone
+		 * write locking is disabled, retry after all pending commands
+		 * have completed.
+		 */
+		if (sshdr.asc == 0x21 && sshdr.ascq == 0x04 &&
+		    !req->q->limits.use_zone_write_lock &&
+		    blk_rq_is_seq_zoned_write(req)) {
+			SCSI_LOG_ERROR_RECOVERY(3,
+				sdev_printk(KERN_INFO, scmd->device,
+					    "Retrying unaligned write at LBA %#llx.\n",
+					    scsi_get_lba(scmd)));
+			return NEEDS_DELAYED_RETRY;
+		}
+
 		if (sshdr.asc == 0x20 || /* Invalid command operation code */
 		    sshdr.asc == 0x21 || /* Logical block address out of range */
 		    sshdr.asc == 0x22 || /* Invalid function */
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 59176946ab56..69da8aee13df 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1443,6 +1443,7 @@ static void scsi_complete(struct request *rq)
 	case ADD_TO_MLQUEUE:
 		scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
 		break;
+	case NEEDS_DELAYED_RETRY:
 	default:
 		scsi_eh_scmd_add(cmd);
 		break;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8a4b0874e7fe..05baf5d1c24c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1238,6 +1238,12 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
 	cmd->transfersize = sdp->sector_size;
 	cmd->underflow = nr_blocks << 9;
 	cmd->allowed = sdkp->max_retries;
+	/*
+	 * Increase the number of allowed retries for zoned writes if zone
+	 * write locking is disabled.
+	 */
+	if (!rq->q->limits.use_zone_write_lock && blk_rq_is_seq_zoned_write(rq))
+		cmd->allowed += rq->q->nr_requests;
 	cmd->sdb.length = nr_blocks * sdp->sector_size;
 
 	SCSI_LOG_HLQUEUE(1,
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index ec093594ba53..6600db046227 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -93,6 +93,7 @@ static inline int scsi_status_is_check_condition(int status)
  * Internal return values.
  */
 enum scsi_disposition {
+	NEEDS_DELAYED_RETRY	= 0x2000,
 	NEEDS_RETRY		= 0x2001,
 	SUCCESS			= 0x2002,
 	FAILED			= 0x2003,

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 08/17] scsi: sd_zbc: Only require an I/O scheduler if needed
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
                   ` (6 preceding siblings ...)
  2023-08-16 19:53 ` [PATCH v9 07/17] scsi: core: Retry unaligned zoned writes Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-17 11:21   ` Damien Le Moal
  2023-08-16 19:53 ` [PATCH v9 09/17] scsi: scsi_debug: Support disabling zone write locking Bart Van Assche
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Damien Le Moal, Ming Lei, James E.J. Bottomley

An I/O scheduler that serializes zoned writes is only needed if the SCSI
LLD does not preserve the write order. Hence only set
ELEVATOR_F_ZBD_SEQ_WRITE if the LLD does not preserve the write order.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/sd_zbc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index a25215507668..718b31bed878 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -955,7 +955,9 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, u8 buf[SD_BUF_SIZE])
 
 	/* The drive satisfies the kernel restrictions: set it up */
 	blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
-	blk_queue_required_elevator_features(q, ELEVATOR_F_ZBD_SEQ_WRITE);
+	if (!q->limits.driver_preserves_write_order)
+		blk_queue_required_elevator_features(q,
+						     ELEVATOR_F_ZBD_SEQ_WRITE);
 	if (sdkp->zones_max_open == U32_MAX)
 		disk_set_max_open_zones(disk, 0);
 	else

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 09/17] scsi: scsi_debug: Support disabling zone write locking
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
                   ` (7 preceding siblings ...)
  2023-08-16 19:53 ` [PATCH v9 08/17] scsi: sd_zbc: Only require an I/O scheduler if needed Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-17 11:25   ` Damien Le Moal
  2023-08-16 19:53 ` [PATCH v9 10/17] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Douglas Gilbert, Damien Le Moal, Ming Lei,
	James E.J. Bottomley

Make it easier to test not using zone write locking by supporting
disabling zone write locking in the scsi_debug driver.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_debug.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 9c0af50501f9..c44c523bde2c 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -832,6 +832,7 @@ static int dix_reads;
 static int dif_errors;
 
 /* ZBC global data */
+static bool sdeb_no_zwrl;
 static bool sdeb_zbc_in_use;	/* true for host-aware and host-managed disks */
 static int sdeb_zbc_zone_cap_mb;
 static int sdeb_zbc_zone_size_mb;
@@ -5138,9 +5139,13 @@ static struct sdebug_dev_info *find_build_dev_info(struct scsi_device *sdev)
 
 static int scsi_debug_slave_alloc(struct scsi_device *sdp)
 {
+	struct request_queue *q = sdp->request_queue;
+
 	if (sdebug_verbose)
 		pr_info("slave_alloc <%u %u %u %llu>\n",
 		       sdp->host->host_no, sdp->channel, sdp->id, sdp->lun);
+	if (sdeb_no_zwrl)
+		q->limits.driver_preserves_write_order = true;
 	return 0;
 }
 
@@ -5738,6 +5743,7 @@ module_param_named(ndelay, sdebug_ndelay, int, S_IRUGO | S_IWUSR);
 module_param_named(no_lun_0, sdebug_no_lun_0, int, S_IRUGO | S_IWUSR);
 module_param_named(no_rwlock, sdebug_no_rwlock, bool, S_IRUGO | S_IWUSR);
 module_param_named(no_uld, sdebug_no_uld, int, S_IRUGO);
+module_param_named(no_zone_write_lock, sdeb_no_zwrl, bool, S_IRUGO);
 module_param_named(num_parts, sdebug_num_parts, int, S_IRUGO);
 module_param_named(num_tgts, sdebug_num_tgts, int, S_IRUGO | S_IWUSR);
 module_param_named(opt_blks, sdebug_opt_blks, int, S_IRUGO);
@@ -5812,6 +5818,8 @@ MODULE_PARM_DESC(ndelay, "response delay in nanoseconds (def=0 -> ignore)");
 MODULE_PARM_DESC(no_lun_0, "no LU number 0 (def=0 -> have lun 0)");
 MODULE_PARM_DESC(no_rwlock, "don't protect user data reads+writes (def=0)");
 MODULE_PARM_DESC(no_uld, "stop ULD (e.g. sd driver) attaching (def=0))");
+MODULE_PARM_DESC(no_zone_write_lock,
+		 "Disable serialization of zoned writes (def=0)");
 MODULE_PARM_DESC(num_parts, "number of partitions(def=0)");
 MODULE_PARM_DESC(num_tgts, "number of targets per host to simulate(def=1)");
 MODULE_PARM_DESC(opt_blks, "optimal transfer length in blocks (def=1024)");

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 10/17] scsi: scsi_debug: Support injecting unaligned write errors
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
                   ` (8 preceding siblings ...)
  2023-08-16 19:53 ` [PATCH v9 09/17] scsi: scsi_debug: Support disabling zone write locking Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-17 11:29   ` Damien Le Moal
  2023-08-16 19:53 ` [PATCH v9 11/17] scsi: ufs: hisi: Rework the code that disables auto-hibernation Bart Van Assche
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Douglas Gilbert, Damien Le Moal, Ming Lei,
	James E.J. Bottomley

Allow user space software, e.g. a blktests test, to inject unaligned
write errors.

Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_debug.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index c44c523bde2c..c92bd6d00249 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -181,6 +181,7 @@ static const char *sdebug_version_date = "20210520";
 #define SDEBUG_OPT_NO_CDB_NOISE		0x4000
 #define SDEBUG_OPT_HOST_BUSY		0x8000
 #define SDEBUG_OPT_CMD_ABORT		0x10000
+#define SDEBUG_OPT_UNALIGNED_WRITE	0x20000
 #define SDEBUG_OPT_ALL_NOISE (SDEBUG_OPT_NOISE | SDEBUG_OPT_Q_NOISE | \
 			      SDEBUG_OPT_RESET_NOISE)
 #define SDEBUG_OPT_ALL_INJECTING (SDEBUG_OPT_RECOVERED_ERR | \
@@ -188,7 +189,8 @@ static const char *sdebug_version_date = "20210520";
 				  SDEBUG_OPT_DIF_ERR | SDEBUG_OPT_DIX_ERR | \
 				  SDEBUG_OPT_SHORT_TRANSFER | \
 				  SDEBUG_OPT_HOST_BUSY | \
-				  SDEBUG_OPT_CMD_ABORT)
+				  SDEBUG_OPT_CMD_ABORT | \
+				  SDEBUG_OPT_UNALIGNED_WRITE)
 #define SDEBUG_OPT_RECOV_DIF_DIX (SDEBUG_OPT_RECOVERED_ERR | \
 				  SDEBUG_OPT_DIF_ERR | SDEBUG_OPT_DIX_ERR)
 
@@ -3587,6 +3589,14 @@ static int resp_write_dt0(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 	struct sdeb_store_info *sip = devip2sip(devip, true);
 	u8 *cmd = scp->cmnd;
 
+	if (unlikely(sdebug_opts & SDEBUG_OPT_UNALIGNED_WRITE &&
+		     atomic_read(&sdeb_inject_pending))) {
+		atomic_set(&sdeb_inject_pending, 0);
+		mk_sense_buffer(scp, ILLEGAL_REQUEST, LBA_OUT_OF_RANGE,
+				UNALIGNED_WRITE_ASCQ);
+		return check_condition_result;
+	}
+
 	switch (cmd[0]) {
 	case WRITE_16:
 		ei_lba = 0;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 11/17] scsi: ufs: hisi: Rework the code that disables auto-hibernation
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
                   ` (9 preceding siblings ...)
  2023-08-16 19:53 ` [PATCH v9 10/17] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-16 19:53 ` [PATCH v9 12/17] scsi: ufs: mediatek: Rework the code for disabling auto-hibernation Bart Van Assche
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, James E.J. Bottomley, Bean Huo, Keoseong Park,
	Krzysztof Kozlowski, Avri Altman

The host driver link startup callback is called indirectly by
ufshcd_probe_hba(). That function applies the auto-hibernation
settings by writing hba->ahit into the auto-hibernation control
register. Simplify the code for disabling auto-hibernation by
setting hba->ahit instead of writing into the auto-hibernation
control register. This patch is part of an effort to move all
auto-hibernation register changes into the UFSHCI driver core.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/host/ufs-hisi.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/ufs/host/ufs-hisi.c b/drivers/ufs/host/ufs-hisi.c
index 5b3060cd0ab8..f2ec687121bb 100644
--- a/drivers/ufs/host/ufs-hisi.c
+++ b/drivers/ufs/host/ufs-hisi.c
@@ -142,7 +142,6 @@ static int ufs_hisi_link_startup_pre_change(struct ufs_hba *hba)
 	struct ufs_hisi_host *host = ufshcd_get_variant(hba);
 	int err;
 	uint32_t value;
-	uint32_t reg;
 
 	/* Unipro VS_mphy_disable */
 	ufshcd_dme_set(hba, UIC_ARG_MIB_SEL(0xD0C1, 0x0), 0x1);
@@ -232,9 +231,7 @@ static int ufs_hisi_link_startup_pre_change(struct ufs_hba *hba)
 		ufshcd_writel(hba, UFS_HCLKDIV_NORMAL_VALUE, UFS_REG_HCLKDIV);
 
 	/* disable auto H8 */
-	reg = ufshcd_readl(hba, REG_AUTO_HIBERNATE_IDLE_TIMER);
-	reg = reg & (~UFS_AHIT_AH8ITV_MASK);
-	ufshcd_writel(hba, reg, REG_AUTO_HIBERNATE_IDLE_TIMER);
+	hba->ahit = 0;
 
 	/* Unipro PA_Local_TX_LCC_Enable */
 	ufshcd_disable_host_tx_lcc(hba);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 12/17] scsi: ufs: mediatek: Rework the code for disabling auto-hibernation
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
                   ` (10 preceding siblings ...)
  2023-08-16 19:53 ` [PATCH v9 11/17] scsi: ufs: hisi: Rework the code that disables auto-hibernation Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-17 18:40   ` Bao D. Nguyen
  2023-08-16 19:53 ` [PATCH v9 13/17] scsi: ufs: sprd: " Bart Van Assche
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Stanley Chu, James E.J. Bottomley,
	Matthias Brugger

Call ufshcd_auto_hibern8_update() instead of writing directly into the
auto-hibernation control register. This patch is part of an effort to
move all auto-hibernation register changes into the UFSHCI driver core.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/host/ufs-mediatek.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ufs/host/ufs-mediatek.c b/drivers/ufs/host/ufs-mediatek.c
index e68b05976f9e..a3cf30e603ca 100644
--- a/drivers/ufs/host/ufs-mediatek.c
+++ b/drivers/ufs/host/ufs-mediatek.c
@@ -1252,7 +1252,7 @@ static void ufs_mtk_auto_hibern8_disable(struct ufs_hba *hba)
 	int ret;
 
 	/* disable auto-hibern8 */
-	ufshcd_writel(hba, 0, REG_AUTO_HIBERNATE_IDLE_TIMER);
+	ufshcd_auto_hibern8_update(hba, 0);
 
 	/* wait host return to idle state when auto-hibern8 off */
 	ufs_mtk_wait_idle_state(hba, 5);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 13/17] scsi: ufs: sprd: Rework the code for disabling auto-hibernation
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
                   ` (11 preceding siblings ...)
  2023-08-16 19:53 ` [PATCH v9 12/17] scsi: ufs: mediatek: Rework the code for disabling auto-hibernation Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-16 19:53 ` [PATCH v9 14/17] scsi: ufs: Rename ufshcd_auto_hibern8_enable() and make it static Bart Van Assche
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, James E.J. Bottomley, Orson Zhai, Baolin Wang,
	Chunyan Zhang, Zhe Wang, Adrian Hunter

Call ufshcd_auto_hibern8_update() instead of writing directly into the
auto-hibernation control register. This patch is part of an effort to
move all auto-hibernation register changes into the UFSHCI driver core.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/host/ufs-sprd.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/ufs/host/ufs-sprd.c b/drivers/ufs/host/ufs-sprd.c
index 2bad75dd6d58..a8e631bb695b 100644
--- a/drivers/ufs/host/ufs-sprd.c
+++ b/drivers/ufs/host/ufs-sprd.c
@@ -180,15 +180,8 @@ static int sprd_ufs_pwr_change_notify(struct ufs_hba *hba,
 static int ufs_sprd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op,
 			    enum ufs_notify_change_status status)
 {
-	unsigned long flags;
-
-	if (status == PRE_CHANGE) {
-		if (ufshcd_is_auto_hibern8_supported(hba)) {
-			spin_lock_irqsave(hba->host->host_lock, flags);
-			ufshcd_writel(hba, 0, REG_AUTO_HIBERNATE_IDLE_TIMER);
-			spin_unlock_irqrestore(hba->host->host_lock, flags);
-		}
-	}
+	if (status == PRE_CHANGE)
+		ufshcd_auto_hibern8_update(hba, 0);
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 14/17] scsi: ufs: Rename ufshcd_auto_hibern8_enable() and make it static
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
                   ` (12 preceding siblings ...)
  2023-08-16 19:53 ` [PATCH v9 13/17] scsi: ufs: sprd: " Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-17 18:49   ` Bao D. Nguyen
  2023-08-16 19:53 ` [PATCH v9 15/17] scsi: ufs: Simplify ufshcd_auto_hibern8_update() Bart Van Assche
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Bao D . Nguyen, James E.J. Bottomley,
	Stanley Chu, Can Guo, Bean Huo, Asutosh Das, Arthur Simchaev,
	Manivannan Sadhasivam, Po-Wen Kao, Eric Biggers, Keoseong Park,
	Daniil Lunev

Rename ufshcd_auto_hibern8_enable() into ufshcd_configure_auto_hibern8()
since this function can enable or disable auto-hibernation. Since
ufshcd_auto_hibern8_enable() is only used inside the UFSHCI driver core,
declare it static. Additionally, move the definition of this function to
just before its first caller.

Suggested-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 24 +++++++++++-------------
 include/ufs/ufshcd.h      |  1 -
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 129446775796..f1bba459b46f 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4337,6 +4337,14 @@ int ufshcd_uic_hibern8_exit(struct ufs_hba *hba)
 }
 EXPORT_SYMBOL_GPL(ufshcd_uic_hibern8_exit);
 
+static void ufshcd_configure_auto_hibern8(struct ufs_hba *hba)
+{
+	if (!ufshcd_is_auto_hibern8_supported(hba))
+		return;
+
+	ufshcd_writel(hba, hba->ahit, REG_AUTO_HIBERNATE_IDLE_TIMER);
+}
+
 void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
 {
 	unsigned long flags;
@@ -4356,21 +4364,13 @@ void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
 	    !pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
 		ufshcd_rpm_get_sync(hba);
 		ufshcd_hold(hba);
-		ufshcd_auto_hibern8_enable(hba);
+		ufshcd_configure_auto_hibern8(hba);
 		ufshcd_release(hba);
 		ufshcd_rpm_put_sync(hba);
 	}
 }
 EXPORT_SYMBOL_GPL(ufshcd_auto_hibern8_update);
 
-void ufshcd_auto_hibern8_enable(struct ufs_hba *hba)
-{
-	if (!ufshcd_is_auto_hibern8_supported(hba))
-		return;
-
-	ufshcd_writel(hba, hba->ahit, REG_AUTO_HIBERNATE_IDLE_TIMER);
-}
-
  /**
  * ufshcd_init_pwr_info - setting the POR (power on reset)
  * values in hba power info
@@ -8815,8 +8815,7 @@ static int ufshcd_probe_hba(struct ufs_hba *hba, bool init_dev_params)
 
 	if (hba->ee_usr_mask)
 		ufshcd_write_ee_control(hba);
-	/* Enable Auto-Hibernate if configured */
-	ufshcd_auto_hibern8_enable(hba);
+	ufshcd_configure_auto_hibern8(hba);
 
 	ufshpb_toggle_state(hba, HPB_RESET, HPB_PRESENT);
 out:
@@ -9809,8 +9808,7 @@ static int __ufshcd_wl_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 		cancel_delayed_work(&hba->rpm_dev_flush_recheck_work);
 	}
 
-	/* Enable Auto-Hibernate if configured */
-	ufshcd_auto_hibern8_enable(hba);
+	ufshcd_configure_auto_hibern8(hba);
 
 	ufshpb_resume(hba);
 	goto out;
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index 6dc11fa0ebb1..040d66d99912 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -1363,7 +1363,6 @@ static inline int ufshcd_disable_host_tx_lcc(struct ufs_hba *hba)
 	return ufshcd_dme_set(hba, UIC_ARG_MIB(PA_LOCAL_TX_LCC_ENABLE), 0);
 }
 
-void ufshcd_auto_hibern8_enable(struct ufs_hba *hba);
 void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
 void ufshcd_fixup_dev_quirks(struct ufs_hba *hba,
 			     const struct ufs_dev_quirk *fixups);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 15/17] scsi: ufs: Simplify ufshcd_auto_hibern8_update()
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
                   ` (13 preceding siblings ...)
  2023-08-16 19:53 ` [PATCH v9 14/17] scsi: ufs: Rename ufshcd_auto_hibern8_enable() and make it static Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-16 19:53 ` [PATCH v9 16/17] scsi: ufs: Forbid auto-hibernation without I/O scheduler Bart Van Assche
  2023-08-16 19:53 ` [PATCH v9 17/17] scsi: ufs: Inform the block layer about write ordering Bart Van Assche
  16 siblings, 0 replies; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Avri Altman, Damien Le Moal, Ming Lei,
	James E.J. Bottomley, Stanley Chu, Can Guo, Bean Huo,
	Asutosh Das, Bao D. Nguyen, Arthur Simchaev

Calls to ufshcd_auto_hibern8_update() are already serialized: this
function is either called if user space software is not running
(preparing to suspend) or from a single sysfs store callback function.
Kernfs serializes sysfs .store() callbacks.

No functionality is changed. This patch makes the next patch in this
series easier to read.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index f1bba459b46f..39000c018d8b 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4347,21 +4347,13 @@ static void ufshcd_configure_auto_hibern8(struct ufs_hba *hba)
 
 void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
 {
-	unsigned long flags;
-	bool update = false;
+	const u32 cur_ahit = READ_ONCE(hba->ahit);
 
-	if (!ufshcd_is_auto_hibern8_supported(hba))
+	if (!ufshcd_is_auto_hibern8_supported(hba) || cur_ahit == ahit)
 		return;
 
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	if (hba->ahit != ahit) {
-		hba->ahit = ahit;
-		update = true;
-	}
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-
-	if (update &&
-	    !pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
+	WRITE_ONCE(hba->ahit, ahit);
+	if (!pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
 		ufshcd_rpm_get_sync(hba);
 		ufshcd_hold(hba);
 		ufshcd_configure_auto_hibern8(hba);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 16/17] scsi: ufs: Forbid auto-hibernation without I/O scheduler
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
                   ` (14 preceding siblings ...)
  2023-08-16 19:53 ` [PATCH v9 15/17] scsi: ufs: Simplify ufshcd_auto_hibern8_update() Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-17 18:50   ` Bao D. Nguyen
  2023-08-16 19:53 ` [PATCH v9 17/17] scsi: ufs: Inform the block layer about write ordering Bart Van Assche
  16 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Can Guo, Avri Altman, Damien Le Moal, Ming Lei,
	James E.J. Bottomley, Bean Huo, Jinyoung Choi, Lu Hongfei,
	Daniil Lunev, Peter Wang, Stanley Chu, Manivannan Sadhasivam,
	Asutosh Das, Bao D. Nguyen, Arthur Simchaev, zhanghui,
	Po-Wen Kao, Eric Biggers, Keoseong Park

UFSHCI 3.0 controllers do not preserve the write order if auto-hibernation
is enabled. If the write order is not preserved, an I/O scheduler is
required to serialize zoned writes. Hence do not allow auto-hibernation
to be enabled without I/O scheduler if a zoned logical unit is present
and if the controller is operating in legacy mode. This patch has been
tested with the following shell script:

    show_ah8() {
        echo -n "auto_hibern8: "
        adb shell "cat /sys/devices/platform/13200000.ufs/auto_hibern8"
    }

    set_ah8() {
        local rc
        adb shell "echo $1 > /sys/devices/platform/13200000.ufs/auto_hibern8"
        rc=$?
        show_ah8
        return $rc
    }

    set_iosched() {
        adb shell "echo $1 >/sys/class/block/$zoned_bdev/queue/scheduler &&
    	           echo -n 'I/O scheduler: ' &&
	           cat /sys/class/block/sde/queue/scheduler"
    }

    adb root
    zoned_bdev=$(adb shell grep -lvw 0 /sys/class/block/sd*/queue/chunk_sectors |&
	         sed 's|/sys/class/block/||g;s|/queue/chunk_sectors||g')
    [ -n "$zoned_bdev" ]
    show_ah8
    set_ah8 0
    set_iosched none
    if set_ah8 150000; then
        echo "Error: enabled AH8 without I/O scheduler"
    fi
    set_iosched mq-deadline
    set_ah8 150000

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Can Guo <quic_cang@quicinc.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufs-sysfs.c   |  2 +-
 drivers/ufs/core/ufshcd-priv.h |  1 -
 drivers/ufs/core/ufshcd.c      | 60 ++++++++++++++++++++++++++++++++--
 include/ufs/ufshcd.h           |  2 +-
 4 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/drivers/ufs/core/ufs-sysfs.c b/drivers/ufs/core/ufs-sysfs.c
index 6c72075750dd..a693dea1bd18 100644
--- a/drivers/ufs/core/ufs-sysfs.c
+++ b/drivers/ufs/core/ufs-sysfs.c
@@ -203,7 +203,7 @@ static ssize_t auto_hibern8_store(struct device *dev,
 		goto out;
 	}
 
-	ufshcd_auto_hibern8_update(hba, ufshcd_us_to_ahit(timer));
+	ret = ufshcd_auto_hibern8_update(hba, ufshcd_us_to_ahit(timer));
 
 out:
 	up(&hba->host_sem);
diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd-priv.h
index 0f3bd943b58b..a2b74fbc2056 100644
--- a/drivers/ufs/core/ufshcd-priv.h
+++ b/drivers/ufs/core/ufshcd-priv.h
@@ -60,7 +60,6 @@ int ufshcd_query_attr(struct ufs_hba *hba, enum query_opcode opcode,
 		      enum attr_idn idn, u8 index, u8 selector, u32 *attr_val);
 int ufshcd_query_flag(struct ufs_hba *hba, enum query_opcode opcode,
 	enum flag_idn idn, u8 index, bool *flag_res);
-void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
 void ufshcd_compl_one_cqe(struct ufs_hba *hba, int task_tag,
 			  struct cq_entry *cqe);
 int ufshcd_mcq_init(struct ufs_hba *hba);
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 39000c018d8b..37d430d20939 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4337,6 +4337,29 @@ int ufshcd_uic_hibern8_exit(struct ufs_hba *hba)
 }
 EXPORT_SYMBOL_GPL(ufshcd_uic_hibern8_exit);
 
+static int ufshcd_update_preserves_write_order(struct ufs_hba *hba,
+						bool preserves_write_order)
+{
+	struct scsi_device *sdev;
+
+	if (!preserves_write_order) {
+		shost_for_each_device(sdev, hba->host) {
+			struct request_queue *q = sdev->request_queue;
+
+			/*
+			 * This code does not check whether the attached I/O
+			 * scheduler serializes zoned writes
+			 * (ELEVATOR_F_ZBD_SEQ_WRITE) because this cannot be
+			 * checked from outside the block layer core.
+			 */
+			if (blk_queue_is_zoned(q) && !q->elevator)
+				return -EPERM;
+		}
+	}
+
+	return 0;
+}
+
 static void ufshcd_configure_auto_hibern8(struct ufs_hba *hba)
 {
 	if (!ufshcd_is_auto_hibern8_supported(hba))
@@ -4345,13 +4368,37 @@ static void ufshcd_configure_auto_hibern8(struct ufs_hba *hba)
 	ufshcd_writel(hba, hba->ahit, REG_AUTO_HIBERNATE_IDLE_TIMER);
 }
 
-void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
+/**
+ * ufshcd_auto_hibern8_update() - Modify the auto-hibernation control register
+ * @hba: per-adapter instance
+ * @ahit: New auto-hibernate settings. Includes the scale and the value of the
+ * auto-hibernation timer. See also the UFSHCI_AHIBERN8_TIMER_MASK and
+ * UFSHCI_AHIBERN8_SCALE_MASK constants.
+ *
+ * Note: enabling auto-hibernation if a zoned logical unit is present without
+ * attaching the mq-deadline scheduler first to the zoned logical unit may cause
+ * unaligned write errors for the zoned logical unit.
+ */
+int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
 {
 	const u32 cur_ahit = READ_ONCE(hba->ahit);
+	bool prev_state, new_state;
+	int ret;
 
 	if (!ufshcd_is_auto_hibern8_supported(hba) || cur_ahit == ahit)
-		return;
+		return 0;
 
+	prev_state = FIELD_GET(UFSHCI_AHIBERN8_TIMER_MASK, cur_ahit);
+	new_state = FIELD_GET(UFSHCI_AHIBERN8_TIMER_MASK, ahit);
+
+	if (!is_mcq_enabled(hba) && !prev_state && new_state) {
+		/*
+		 * Auto-hibernation will be enabled for legacy UFSHCI mode.
+		 */
+		ret = ufshcd_update_preserves_write_order(hba, false);
+		if (ret)
+			return ret;
+	}
 	WRITE_ONCE(hba->ahit, ahit);
 	if (!pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
 		ufshcd_rpm_get_sync(hba);
@@ -4360,6 +4407,15 @@ void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
 		ufshcd_release(hba);
 		ufshcd_rpm_put_sync(hba);
 	}
+	if (!is_mcq_enabled(hba) && prev_state && !new_state) {
+		/*
+		 * Auto-hibernation has been disabled.
+		 */
+		ret = ufshcd_update_preserves_write_order(hba, true);
+		WARN_ON_ONCE(ret);
+	}
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(ufshcd_auto_hibern8_update);
 
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index 040d66d99912..7ae071a6811c 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -1363,7 +1363,7 @@ static inline int ufshcd_disable_host_tx_lcc(struct ufs_hba *hba)
 	return ufshcd_dme_set(hba, UIC_ARG_MIB(PA_LOCAL_TX_LCC_ENABLE), 0);
 }
 
-void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
+int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
 void ufshcd_fixup_dev_quirks(struct ufs_hba *hba,
 			     const struct ufs_dev_quirk *fixups);
 #define SD_ASCII_STD true

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 17/17] scsi: ufs: Inform the block layer about write ordering
  2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
                   ` (15 preceding siblings ...)
  2023-08-16 19:53 ` [PATCH v9 16/17] scsi: ufs: Forbid auto-hibernation without I/O scheduler Bart Van Assche
@ 2023-08-16 19:53 ` Bart Van Assche
  2023-08-17 19:00   ` Bao D. Nguyen
  16 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-16 19:53 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Can Guo, Avri Altman, Damien Le Moal, Ming Lei,
	James E.J. Bottomley, Stanley Chu, Bean Huo, Asutosh Das,
	Bao D. Nguyen, Arthur Simchaev

From the UFSHCI 4.0 specification, about the legacy (single queue) mode:
"The host controller always process transfer requests in-order according
to the order submitted to the list. In case of multiple commands with
single doorbell register ringing (batch mode), The dispatch order for
these transfer requests by host controller will base on their index in
the List. A transfer request with lower index value will be executed
before a transfer request with higher index value."

From the UFSHCI 4.0 specification, about the MCQ mode:
"Command Submission
1. Host SW writes an Entry to SQ
2. Host SW updates SQ doorbell tail pointer

Command Processing
3. After fetching the Entry, Host Controller updates SQ doorbell head
   pointer
4. Host controller sends COMMAND UPIU to UFS device"

In other words, for both legacy and MCQ mode, UFS controllers are
required to forward commands to the UFS device in the order these
commands have been received from the host.

Notes:
- For legacy mode this is only correct if the host submits one
  command at a time. The UFS driver does this.
- Also in legacy mode, the command order is not preserved if
  auto-hibernation is enabled in the UFS controller. Hence, enable
  zone write locking if auto-hibernation is enabled.

This patch improves performance as follows on my test setup:
- With the mq-deadline scheduler: 2.5x more IOPS for small writes.
- When not using an I/O scheduler compared to using mq-deadline with
  zone locking: 4x more IOPS for small writes.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Can Guo <quic_cang@quicinc.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 37d430d20939..7f5049a66cca 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4356,6 +4356,19 @@ static int ufshcd_update_preserves_write_order(struct ufs_hba *hba,
 				return -EPERM;
 		}
 	}
+	shost_for_each_device(sdev, hba->host)
+		blk_freeze_queue_start(sdev->request_queue);
+	shost_for_each_device(sdev, hba->host) {
+		struct request_queue *q = sdev->request_queue;
+
+		blk_mq_freeze_queue_wait(q);
+		q->limits.driver_preserves_write_order = preserves_write_order;
+		blk_queue_required_elevator_features(q,
+			preserves_write_order ? 0 : ELEVATOR_F_ZBD_SEQ_WRITE);
+		if (q->disk)
+			disk_set_zoned(q->disk, q->limits.zoned);
+		blk_mq_unfreeze_queue(q);
+	}
 
 	return 0;
 }
@@ -4393,7 +4406,8 @@ int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
 
 	if (!is_mcq_enabled(hba) && !prev_state && new_state) {
 		/*
-		 * Auto-hibernation will be enabled for legacy UFSHCI mode.
+		 * Auto-hibernation will be enabled for legacy UFSHCI mode. Tell
+		 * the block layer that write requests may be reordered.
 		 */
 		ret = ufshcd_update_preserves_write_order(hba, false);
 		if (ret)
@@ -4409,7 +4423,8 @@ int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
 	}
 	if (!is_mcq_enabled(hba) && prev_state && !new_state) {
 		/*
-		 * Auto-hibernation has been disabled.
+		 * Auto-hibernation has been disabled. Tell the block layer that
+		 * the order of write requests is preserved.
 		 */
 		ret = ufshcd_update_preserves_write_order(hba, true);
 		WARN_ON_ONCE(ret);
@@ -5187,6 +5202,9 @@ static int ufshcd_slave_configure(struct scsi_device *sdev)
 
 	ufshcd_hpb_configure(hba, sdev);
 
+	q->limits.driver_preserves_write_order =
+		!ufshcd_is_auto_hibern8_supported(hba) ||
+		FIELD_GET(UFSHCI_AHIBERN8_TIMER_MASK, hba->ahit) == 0;
 	blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1);
 	if (hba->quirks & UFSHCD_QUIRK_4KB_DMA_ALIGNMENT)
 		blk_queue_update_dma_alignment(q, SZ_4K - 1);

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 01/17] block: Introduce more member variables related to zone write locking
  2023-08-16 19:53 ` [PATCH v9 01/17] block: Introduce more member variables related to zone write locking Bart Van Assche
@ 2023-08-17 11:00   ` Damien Le Moal
  0 siblings, 0 replies; 46+ messages in thread
From: Damien Le Moal @ 2023-08-17 11:00 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei

On 8/17/23 04:53, Bart Van Assche wrote:
> Many but not all storage controllers require serialization of zoned writes.
> Introduce two new request queue limit member variables related to write
> serialization. 'driver_preserves_write_order' allows block drivers to
> indicate that the order of write commands is preserved and hence that
> serialization of writes per zone is not required. 'use_zone_write_lock' is
> set by disk_set_zoned() if and only if the block device has zones and if
> the block driver does not preserve the order of write requests.
> 
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 02/17] block: Only use write locking if necessary
  2023-08-16 19:53 ` [PATCH v9 02/17] block: Only use write locking if necessary Bart Van Assche
@ 2023-08-17 11:01   ` Damien Le Moal
  2023-08-17 14:21     ` Bart Van Assche
  0 siblings, 1 reply; 46+ messages in thread
From: Damien Le Moal @ 2023-08-17 11:01 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei

On 8/17/23 04:53, Bart Van Assche wrote:
> Make blk_req_needs_zone_write_lock() return false if
> q->limits.use_zone_write_lock is false. Inline this function because it
> is short and because it is called from the hot path of the mq-deadline
> I/O scheduler.

Your are not actually inlining the function. Did you forget to move it to
blkdev.h ?

Otherwise, the change looks OK.

> 
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-zoned.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index 112620985bff..d8a80cce832f 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -53,11 +53,16 @@ const char *blk_zone_cond_str(enum blk_zone_cond zone_cond)
>  EXPORT_SYMBOL_GPL(blk_zone_cond_str);
>  
>  /*
> - * Return true if a request is a write requests that needs zone write locking.
> + * Return true if a request is a write request that needs zone write locking.
>   */
>  bool blk_req_needs_zone_write_lock(struct request *rq)
>  {
> -	if (!rq->q->disk->seq_zones_wlock)
> +	struct request_queue *q = rq->q;
> +
> +	if (!q->limits.use_zone_write_lock)
> +		return false;
> +
> +	if (!q->disk->seq_zones_wlock)
>  		return false;
>  
>  	return blk_rq_is_seq_zoned_write(rq);

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 03/17] block/mq-deadline: Only use zone locking if necessary
  2023-08-16 19:53 ` [PATCH v9 03/17] block/mq-deadline: Only use zone " Bart Van Assche
@ 2023-08-17 11:02   ` Damien Le Moal
  0 siblings, 0 replies; 46+ messages in thread
From: Damien Le Moal @ 2023-08-17 11:02 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei

On 8/17/23 04:53, Bart Van Assche wrote:
> Measurements have shown that limiting the queue depth to one per zone for
> zoned writes has a significant negative performance impact on zoned UFS
> devices. Hence this patch that disables zone locking by the mq-deadline
> scheduler if the storage controller preserves the command order. This
> patch is based on the following assumptions:
> - It happens infrequently that zoned write requests are reordered by the
>   block layer.
> - The I/O priority of all write requests is the same per zone.
> - Either no I/O scheduler is used or an I/O scheduler is used that
>   serializes write requests per zone.
> 
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 04/17] scsi: core: Call .eh_prepare_resubmit() before resubmitting
  2023-08-16 19:53 ` [PATCH v9 04/17] scsi: core: Call .eh_prepare_resubmit() before resubmitting Bart Van Assche
@ 2023-08-17 11:10   ` Damien Le Moal
  2023-08-17 14:26     ` Bart Van Assche
  0 siblings, 1 reply; 46+ messages in thread
From: Damien Le Moal @ 2023-08-17 11:10 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, James E.J. Bottomley

On 8/17/23 04:53, Bart Van Assche wrote:
> Introduce the .eh_prepare_resubmit function pointer in struct scsi_driver.
> Make the error handler call .eh_prepare_resubmit() before resubmitting
> commands. A later patch will use this functionality to sort SCSI commands
> by LBA from inside the SCSI disk driver.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/scsi_error.c  | 65 ++++++++++++++++++++++++++++++++++++++
>  drivers/scsi/scsi_priv.h   |  1 +
>  include/scsi/scsi_driver.h |  1 +
>  3 files changed, 67 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index c67cdcdc3ba8..4393a7fd8a07 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -27,6 +27,7 @@
>  #include <linux/blkdev.h>
>  #include <linux/delay.h>
>  #include <linux/jiffies.h>
> +#include <linux/list_sort.h>
>  
>  #include <scsi/scsi.h>
>  #include <scsi/scsi_cmnd.h>
> @@ -2186,6 +2187,68 @@ void scsi_eh_ready_devs(struct Scsi_Host *shost,
>  }
>  EXPORT_SYMBOL_GPL(scsi_eh_ready_devs);
>  
> +/*
> + * Returns true if the commands in @done_q should be sorted in LBA order
> + * before being resubmitted.
> + */
> +static bool scsi_needs_sorting(struct list_head *done_q)
> +{
> +	struct scsi_cmnd *scmd;
> +
> +	list_for_each_entry(scmd, done_q, eh_entry) {
> +		struct request *rq = scsi_cmd_to_rq(scmd);
> +
> +		if (!rq->q->limits.use_zone_write_lock &&
> +		    blk_rq_is_seq_zoned_write(rq))
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * Comparison function that allows to sort SCSI commands by ULD driver.
> + */
> +static int scsi_cmp_uld(void *priv, const struct list_head *_a,
> +			const struct list_head *_b)
> +{
> +	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
> +	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
> +
> +	/* See also the comment above the list_sort() definition. */
> +	return scsi_cmd_to_driver(a) > scsi_cmd_to_driver(b);
> +}
> +
> +void scsi_call_prepare_resubmit(struct list_head *done_q)
> +{
> +	struct scsi_cmnd *scmd, *next;
> +
> +	if (!scsi_needs_sorting(done_q))
> +		return;

This is strange. The eh_prepare_resubmit callback is generic and its name does
not indicate anything related to sorting by LBAs. So this check would prevent
other actions not related to sorting by LBA. This should go away.

In patch 6, based on the device characteristics, the sd driver should decides
if it needs to set .eh_prepare_resubmit or not.

And ideally, if all ULDs have eh_prepare_resubmit == NULL, this function should
return here before going through the list of commands to resubmit. Given that
this list should generally be small, going through it and doing nothing should
be OK though...

> +
> +	/* Sort pending SCSI commands by ULD. */
> +	list_sort(NULL, done_q, scsi_cmp_uld);
> +
> +	/*
> +	 * Call .eh_prepare_resubmit for each range of commands with identical
> +	 * ULD driver pointer.
> +	 */
> +	list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
> +		struct scsi_driver *uld = scsi_cmd_to_driver(scmd);
> +		struct list_head *prev, uld_cmd_list;
> +
> +		while (&next->eh_entry != done_q &&
> +		       scsi_cmd_to_driver(next) == uld)
> +			next = list_next_entry(next, eh_entry);
> +		if (!uld->eh_prepare_resubmit)
> +			continue;
> +		prev = scmd->eh_entry.prev;
> +		list_cut_position(&uld_cmd_list, prev, next->eh_entry.prev);
> +		uld->eh_prepare_resubmit(&uld_cmd_list);
> +		list_splice(&uld_cmd_list, prev);
> +	}
> +}
> +
>  /**
>   * scsi_eh_flush_done_q - finish processed commands or retry them.
>   * @done_q:	list_head of processed commands.
> @@ -2194,6 +2257,8 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
>  {
>  	struct scsi_cmnd *scmd, *next;
>  
> +	scsi_call_prepare_resubmit(done_q);
> +
>  	list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
>  		list_del_init(&scmd->eh_entry);
>  		if (scsi_device_online(scmd->device) &&
> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> index f42388ecb024..df4af4645430 100644
> --- a/drivers/scsi/scsi_priv.h
> +++ b/drivers/scsi/scsi_priv.h
> @@ -101,6 +101,7 @@ int scsi_eh_get_sense(struct list_head *work_q,
>  		      struct list_head *done_q);
>  bool scsi_noretry_cmd(struct scsi_cmnd *scmd);
>  void scsi_eh_done(struct scsi_cmnd *scmd);
> +void scsi_call_prepare_resubmit(struct list_head *done_q);
>  
>  /* scsi_lib.c */
>  extern int scsi_maybe_unblock_host(struct scsi_device *sdev);
> diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h
> index 4ce1988b2ba0..2b11be896eee 100644
> --- a/include/scsi/scsi_driver.h
> +++ b/include/scsi/scsi_driver.h
> @@ -18,6 +18,7 @@ struct scsi_driver {
>  	int (*done)(struct scsi_cmnd *);
>  	int (*eh_action)(struct scsi_cmnd *, int);
>  	void (*eh_reset)(struct scsi_cmnd *);
> +	void (*eh_prepare_resubmit)(struct list_head *cmd_list);
>  };
>  #define to_scsi_driver(drv) \
>  	container_of((drv), struct scsi_driver, gendrv)

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 06/17] scsi: sd: Sort commands by LBA before resubmitting
  2023-08-16 19:53 ` [PATCH v9 06/17] scsi: sd: Sort commands by LBA before resubmitting Bart Van Assche
@ 2023-08-17 11:13   ` Damien Le Moal
  2023-08-17 14:34     ` Bart Van Assche
  0 siblings, 1 reply; 46+ messages in thread
From: Damien Le Moal @ 2023-08-17 11:13 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, James E.J. Bottomley

On 8/17/23 04:53, Bart Van Assche wrote:
> Sort SCSI commands by LBA before the SCSI error handler resubmits
> these commands. This is necessary when resubmitting zoned writes
> (REQ_OP_WRITE) if multiple writes have been queued for a single zone.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/sd.c | 35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 3c668cfb146d..8a4b0874e7fe 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -47,6 +47,7 @@
>  #include <linux/blkpg.h>
>  #include <linux/blk-pm.h>
>  #include <linux/delay.h>
> +#include <linux/list_sort.h>
>  #include <linux/major.h>
>  #include <linux/mutex.h>
>  #include <linux/string_helpers.h>
> @@ -117,6 +118,7 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt);
>  static int sd_done(struct scsi_cmnd *);
>  static void sd_eh_reset(struct scsi_cmnd *);
>  static int sd_eh_action(struct scsi_cmnd *, int);
> +static void sd_prepare_resubmit(struct list_head *cmd_list);
>  static void sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer);
>  static void scsi_disk_release(struct device *cdev);
>  
> @@ -617,6 +619,7 @@ static struct scsi_driver sd_template = {
>  	.done			= sd_done,
>  	.eh_action		= sd_eh_action,
>  	.eh_reset		= sd_eh_reset,
> +	.eh_prepare_resubmit	= sd_prepare_resubmit,
>  };
>  
>  /*
> @@ -2018,6 +2021,38 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
>  	return eh_disp;
>  }
>  
> +static int sd_cmp_sector(void *priv, const struct list_head *_a,
> +			 const struct list_head *_b)
> +{
> +	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
> +	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
> +	struct request *rq_a = scsi_cmd_to_rq(a);
> +	struct request *rq_b = scsi_cmd_to_rq(b);
> +	bool use_zwl_a = rq_a->q->limits.use_zone_write_lock;
> +	bool use_zwl_b = rq_b->q->limits.use_zone_write_lock;
> +
> +	/*
> +	 * Order the commands that need zone write locking after the commands
> +	 * that do not need zone write locking. Order the commands that do not
> +	 * need zone write locking by LBA. Do not reorder the commands that
> +	 * need zone write locking. See also the comment above the list_sort()
> +	 * definition.
> +	 */
> +	if (use_zwl_a || use_zwl_b)
> +		return use_zwl_a > use_zwl_b;
> +	return blk_rq_pos(rq_a) > blk_rq_pos(rq_b);
> +}
> +
> +static void sd_prepare_resubmit(struct list_head *cmd_list)
> +{
> +	/*
> +	 * Sort pending SCSI commands in starting sector order. This is
> +	 * important if one of the SCSI devices associated with @shost is a
> +	 * zoned block device for which zone write locking is disabled.
> +	 */
> +	list_sort(NULL, cmd_list, sd_cmp_sector);

We should not need to do this for regular devices or zoned devices using zone
write locking. So returning doing nothing would be better but the callback
lacks a pointer to the scsi_device to test that.

> +}
> +
>  static unsigned int sd_completed_bytes(struct scsi_cmnd *scmd)
>  {
>  	struct request *req = scsi_cmd_to_rq(scmd);

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 07/17] scsi: core: Retry unaligned zoned writes
  2023-08-16 19:53 ` [PATCH v9 07/17] scsi: core: Retry unaligned zoned writes Bart Van Assche
@ 2023-08-17 11:21   ` Damien Le Moal
  0 siblings, 0 replies; 46+ messages in thread
From: Damien Le Moal @ 2023-08-17 11:21 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, James E.J. Bottomley

On 8/17/23 04:53, Bart Van Assche wrote:
> If zoned writes (REQ_OP_WRITE) for a sequential write required zone have
> a starting LBA that differs from the write pointer, e.g. because zoned
> writes have been reordered, then the storage device will respond with an
> UNALIGNED WRITE COMMAND error. Send commands that failed with an
> unaligned write error to the SCSI error handler if zone write locking is
> disabled. The SCSI error handler will sort SCSI commands per LBA before
> resubmitting these.
> 
> If zone write locking is disabled, increase the number of retries for
> write commands sent to a sequential zone to the maximum number of
> outstanding commands because in the worst case the number of times
> reordered zoned writes have to be retried is (number of outstanding
> writes per sequential zone) - 1.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Looks OK.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 08/17] scsi: sd_zbc: Only require an I/O scheduler if needed
  2023-08-16 19:53 ` [PATCH v9 08/17] scsi: sd_zbc: Only require an I/O scheduler if needed Bart Van Assche
@ 2023-08-17 11:21   ` Damien Le Moal
  0 siblings, 0 replies; 46+ messages in thread
From: Damien Le Moal @ 2023-08-17 11:21 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, James E.J. Bottomley

On 8/17/23 04:53, Bart Van Assche wrote:
> An I/O scheduler that serializes zoned writes is only needed if the SCSI
> LLD does not preserve the write order. Hence only set
> ELEVATOR_F_ZBD_SEQ_WRITE if the LLD does not preserve the write order.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 09/17] scsi: scsi_debug: Support disabling zone write locking
  2023-08-16 19:53 ` [PATCH v9 09/17] scsi: scsi_debug: Support disabling zone write locking Bart Van Assche
@ 2023-08-17 11:25   ` Damien Le Moal
  2023-08-17 14:35     ` Bart Van Assche
  0 siblings, 1 reply; 46+ messages in thread
From: Damien Le Moal @ 2023-08-17 11:25 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Douglas Gilbert, Ming Lei, James E.J. Bottomley

On 8/17/23 04:53, Bart Van Assche wrote:
> Make it easier to test not using zone write locking by supporting
> disabling zone write locking in the scsi_debug driver.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Douglas Gilbert <dgilbert@interlog.com>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/scsi_debug.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
> index 9c0af50501f9..c44c523bde2c 100644
> --- a/drivers/scsi/scsi_debug.c
> +++ b/drivers/scsi/scsi_debug.c
> @@ -832,6 +832,7 @@ static int dix_reads;
>  static int dif_errors;
>  
>  /* ZBC global data */
> +static bool sdeb_no_zwrl;
>  static bool sdeb_zbc_in_use;	/* true for host-aware and host-managed disks */
>  static int sdeb_zbc_zone_cap_mb;
>  static int sdeb_zbc_zone_size_mb;
> @@ -5138,9 +5139,13 @@ static struct sdebug_dev_info *find_build_dev_info(struct scsi_device *sdev)
>  
>  static int scsi_debug_slave_alloc(struct scsi_device *sdp)
>  {
> +	struct request_queue *q = sdp->request_queue;
> +
>  	if (sdebug_verbose)
>  		pr_info("slave_alloc <%u %u %u %llu>\n",
>  		       sdp->host->host_no, sdp->channel, sdp->id, sdp->lun);
> +	if (sdeb_no_zwrl)
> +		q->limits.driver_preserves_write_order = true;

The option is named and discribed as "no_zone_write_lock", which is a block
layer concept. Given that this sets driver_preserves_write_order and does not
touch the use_zone_write_locking limit, I think it is better to name the option
"preserve_write_order" (or similar) to avoid confusion.

>  	return 0;
>  }
>  
> @@ -5738,6 +5743,7 @@ module_param_named(ndelay, sdebug_ndelay, int, S_IRUGO | S_IWUSR);
>  module_param_named(no_lun_0, sdebug_no_lun_0, int, S_IRUGO | S_IWUSR);
>  module_param_named(no_rwlock, sdebug_no_rwlock, bool, S_IRUGO | S_IWUSR);
>  module_param_named(no_uld, sdebug_no_uld, int, S_IRUGO);
> +module_param_named(no_zone_write_lock, sdeb_no_zwrl, bool, S_IRUGO);
>  module_param_named(num_parts, sdebug_num_parts, int, S_IRUGO);
>  module_param_named(num_tgts, sdebug_num_tgts, int, S_IRUGO | S_IWUSR);
>  module_param_named(opt_blks, sdebug_opt_blks, int, S_IRUGO);
> @@ -5812,6 +5818,8 @@ MODULE_PARM_DESC(ndelay, "response delay in nanoseconds (def=0 -> ignore)");
>  MODULE_PARM_DESC(no_lun_0, "no LU number 0 (def=0 -> have lun 0)");
>  MODULE_PARM_DESC(no_rwlock, "don't protect user data reads+writes (def=0)");
>  MODULE_PARM_DESC(no_uld, "stop ULD (e.g. sd driver) attaching (def=0))");
> +MODULE_PARM_DESC(no_zone_write_lock,
> +		 "Disable serialization of zoned writes (def=0)");
>  MODULE_PARM_DESC(num_parts, "number of partitions(def=0)");
>  MODULE_PARM_DESC(num_tgts, "number of targets per host to simulate(def=1)");
>  MODULE_PARM_DESC(opt_blks, "optimal transfer length in blocks (def=1024)");

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 10/17] scsi: scsi_debug: Support injecting unaligned write errors
  2023-08-16 19:53 ` [PATCH v9 10/17] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
@ 2023-08-17 11:29   ` Damien Le Moal
  0 siblings, 0 replies; 46+ messages in thread
From: Damien Le Moal @ 2023-08-17 11:29 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Douglas Gilbert, Ming Lei, James E.J. Bottomley

On 8/17/23 04:53, Bart Van Assche wrote:
> Allow user space software, e.g. a blktests test, to inject unaligned
> write errors.
> 
> Acked-by: Douglas Gilbert <dgilbert@interlog.com>
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 02/17] block: Only use write locking if necessary
  2023-08-17 11:01   ` Damien Le Moal
@ 2023-08-17 14:21     ` Bart Van Assche
  0 siblings, 0 replies; 46+ messages in thread
From: Bart Van Assche @ 2023-08-17 14:21 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei

On 8/17/23 04:01, Damien Le Moal wrote:
> On 8/17/23 04:53, Bart Van Assche wrote:
>> Make blk_req_needs_zone_write_lock() return false if
>> q->limits.use_zone_write_lock is false. Inline this function because it
>> is short and because it is called from the hot path of the mq-deadline
>> I/O scheduler.
> 
> Your are not actually inlining the function. Did you forget to move it to
> blkdev.h ?

I considered inlining but eventually decided not to inline 
blk_req_needs_zone_write_lock(). I will update the patch description.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 04/17] scsi: core: Call .eh_prepare_resubmit() before resubmitting
  2023-08-17 11:10   ` Damien Le Moal
@ 2023-08-17 14:26     ` Bart Van Assche
  2023-08-18  2:38       ` Damien Le Moal
  0 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-17 14:26 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, James E.J. Bottomley

On 8/17/23 04:10, Damien Le Moal wrote:
> On 8/17/23 04:53, Bart Van Assche wrote:
>> +/*
>> + * Returns true if the commands in @done_q should be sorted in LBA order
>> + * before being resubmitted.
>> + */
>> +static bool scsi_needs_sorting(struct list_head *done_q)
>> +{
>> +	struct scsi_cmnd *scmd;
>> +
>> +	list_for_each_entry(scmd, done_q, eh_entry) {
>> +		struct request *rq = scsi_cmd_to_rq(scmd);
>> +
>> +		if (!rq->q->limits.use_zone_write_lock &&
>> +		    blk_rq_is_seq_zoned_write(rq))
>> +			return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>> +/*
>> + * Comparison function that allows to sort SCSI commands by ULD driver.
>> + */
>> +static int scsi_cmp_uld(void *priv, const struct list_head *_a,
>> +			const struct list_head *_b)
>> +{
>> +	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
>> +	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
>> +
>> +	/* See also the comment above the list_sort() definition. */
>> +	return scsi_cmd_to_driver(a) > scsi_cmd_to_driver(b);
>> +}
>> +
>> +void scsi_call_prepare_resubmit(struct list_head *done_q)
>> +{
>> +	struct scsi_cmnd *scmd, *next;
>> +
>> +	if (!scsi_needs_sorting(done_q))
>> +		return;
> 
> This is strange. The eh_prepare_resubmit callback is generic and its name does
> not indicate anything related to sorting by LBAs. So this check would prevent
> other actions not related to sorting by LBA. This should go away.
> 
> In patch 6, based on the device characteristics, the sd driver should decides
> if it needs to set .eh_prepare_resubmit or not.
> 
> And ideally, if all ULDs have eh_prepare_resubmit == NULL, this function should
> return here before going through the list of commands to resubmit. Given that
> this list should generally be small, going through it and doing nothing should
> be OK though...

I can add a eh_prepare_resubmit == NULL check early in 
scsi_call_prepare_resubmit(). Regarding the code inside 
scsi_needs_sorting(), how about moving that code into an additional 
callback function, e.g. eh_needs_prepare_resubmit? Setting 
.eh_prepare_resubmit depending on the zone model would prevent 
constification of struct scsi_driver.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 06/17] scsi: sd: Sort commands by LBA before resubmitting
  2023-08-17 11:13   ` Damien Le Moal
@ 2023-08-17 14:34     ` Bart Van Assche
  2023-08-18  2:41       ` Damien Le Moal
  0 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-17 14:34 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, James E.J. Bottomley

On 8/17/23 04:13, Damien Le Moal wrote:
> On 8/17/23 04:53, Bart Van Assche wrote:
>> +static int sd_cmp_sector(void *priv, const struct list_head *_a,
>> +			 const struct list_head *_b)
>> +{
>> +	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
>> +	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
>> +	struct request *rq_a = scsi_cmd_to_rq(a);
>> +	struct request *rq_b = scsi_cmd_to_rq(b);
>> +	bool use_zwl_a = rq_a->q->limits.use_zone_write_lock;
>> +	bool use_zwl_b = rq_b->q->limits.use_zone_write_lock;
>> +
>> +	/*
>> +	 * Order the commands that need zone write locking after the commands
>> +	 * that do not need zone write locking. Order the commands that do not
>> +	 * need zone write locking by LBA. Do not reorder the commands that
>> +	 * need zone write locking. See also the comment above the list_sort()
>> +	 * definition.
>> +	 */
>> +	if (use_zwl_a || use_zwl_b)
>> +		return use_zwl_a > use_zwl_b;
>> +	return blk_rq_pos(rq_a) > blk_rq_pos(rq_b);
>> +}
>> +
>> +static void sd_prepare_resubmit(struct list_head *cmd_list)
>> +{
>> +	/*
>> +	 * Sort pending SCSI commands in starting sector order. This is
>> +	 * important if one of the SCSI devices associated with @shost is a
>> +	 * zoned block device for which zone write locking is disabled.
>> +	 */
>> +	list_sort(NULL, cmd_list, sd_cmp_sector);
> 
> We should not need to do this for regular devices or zoned devices using zone
> write locking. So returning doing nothing would be better but the callback
> lacks a pointer to the scsi_device to test that.

@cmd_list may include SCSI commands for multiple SCSI devices so I'm not 
sure which SCSI device pointer you are referring to in the above comment?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 09/17] scsi: scsi_debug: Support disabling zone write locking
  2023-08-17 11:25   ` Damien Le Moal
@ 2023-08-17 14:35     ` Bart Van Assche
  0 siblings, 0 replies; 46+ messages in thread
From: Bart Van Assche @ 2023-08-17 14:35 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Douglas Gilbert, Ming Lei, James E.J. Bottomley

On 8/17/23 04:25, Damien Le Moal wrote:
>>   static int scsi_debug_slave_alloc(struct scsi_device *sdp)
>>   {
>> +	struct request_queue *q = sdp->request_queue;
>> +
>>   	if (sdebug_verbose)
>>   		pr_info("slave_alloc <%u %u %u %llu>\n",
>>   		       sdp->host->host_no, sdp->channel, sdp->id, sdp->lun);
>> +	if (sdeb_no_zwrl)
>> +		q->limits.driver_preserves_write_order = true;
> 
> The option is named and discribed as "no_zone_write_lock", which is a block
> layer concept. Given that this sets driver_preserves_write_order and does not
> touch the use_zone_write_locking limit, I think it is better to name the option
> "preserve_write_order" (or similar) to avoid confusion.

I will make this change.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 12/17] scsi: ufs: mediatek: Rework the code for disabling auto-hibernation
  2023-08-16 19:53 ` [PATCH v9 12/17] scsi: ufs: mediatek: Rework the code for disabling auto-hibernation Bart Van Assche
@ 2023-08-17 18:40   ` Bao D. Nguyen
  2023-08-17 19:13     ` Bart Van Assche
  0 siblings, 1 reply; 46+ messages in thread
From: Bao D. Nguyen @ 2023-08-17 18:40 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Stanley Chu, James E.J. Bottomley, Matthias Brugger

On 8/16/2023 12:53 PM, Bart Van Assche wrote:
> Call ufshcd_auto_hibern8_update() instead of writing directly into the
> auto-hibernation control register. This patch is part of an effort to
> move all auto-hibernation register changes into the UFSHCI driver core.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/ufs/host/ufs-mediatek.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/ufs/host/ufs-mediatek.c b/drivers/ufs/host/ufs-mediatek.c
> index e68b05976f9e..a3cf30e603ca 100644
> --- a/drivers/ufs/host/ufs-mediatek.c
> +++ b/drivers/ufs/host/ufs-mediatek.c
> @@ -1252,7 +1252,7 @@ static void ufs_mtk_auto_hibern8_disable(struct ufs_hba *hba)
>   	int ret;
>   
>   	/* disable auto-hibern8 */
> -	ufshcd_writel(hba, 0, REG_AUTO_HIBERNATE_IDLE_TIMER);
> +	ufshcd_auto_hibern8_update(hba, 0);
Since you now use ufshcd_auto_hibern8_update(), the caller should not 
need to check for ufshcd_is_auto_hibern8_supported() because this is 
already part of the hibern8_update(). Suggest remove the if statement 
from the caller.

	if (ufshcd_is_auto_hibern8_supported(hba))
		ufs_mtk_auto_hibern8_disable(hba);

>   
>   	/* wait host return to idle state when auto-hibern8 off */
>   	ufs_mtk_wait_idle_state(hba, 5);


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 14/17] scsi: ufs: Rename ufshcd_auto_hibern8_enable() and make it static
  2023-08-16 19:53 ` [PATCH v9 14/17] scsi: ufs: Rename ufshcd_auto_hibern8_enable() and make it static Bart Van Assche
@ 2023-08-17 18:49   ` Bao D. Nguyen
  2023-08-17 19:16     ` Bart Van Assche
  0 siblings, 1 reply; 46+ messages in thread
From: Bao D. Nguyen @ 2023-08-17 18:49 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	James E.J. Bottomley, Stanley Chu, Can Guo, Bean Huo,
	Asutosh Das, Arthur Simchaev, Manivannan Sadhasivam, Po-Wen Kao,
	Eric Biggers, Keoseong Park, Daniil Lunev

On 8/16/2023 12:53 PM, Bart Van Assche wrote:
> Rename ufshcd_auto_hibern8_enable() into ufshcd_configure_auto_hibern8()
> since this function can enable or disable auto-hibernation. Since
> ufshcd_auto_hibern8_enable() is only used inside the UFSHCI driver core,
> declare it static. Additionally, move the definition of this function to
> just before its first caller.
> 
> Suggested-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/ufs/core/ufshcd.c | 24 +++++++++++-------------
>   include/ufs/ufshcd.h      |  1 -
>   2 files changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 129446775796..f1bba459b46f 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -4337,6 +4337,14 @@ int ufshcd_uic_hibern8_exit(struct ufs_hba *hba)
>   }
>   EXPORT_SYMBOL_GPL(ufshcd_uic_hibern8_exit);
>   
> +static void ufshcd_configure_auto_hibern8(struct ufs_hba *hba)
> +{
> +	if (!ufshcd_is_auto_hibern8_supported(hba))
> +		return;
> +
> +	ufshcd_writel(hba, hba->ahit, REG_AUTO_HIBERNATE_IDLE_TIMER);
> +}
> +
>   void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
>   {
>   	unsigned long flags;
> @@ -4356,21 +4364,13 @@ void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
>   	    !pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
>   		ufshcd_rpm_get_sync(hba);
>   		ufshcd_hold(hba);
> -		ufshcd_auto_hibern8_enable(hba);
> +		ufshcd_configure_auto_hibern8(hba);
>   		ufshcd_release(hba);
>   		ufshcd_rpm_put_sync(hba);
>   	}
>   }
>   EXPORT_SYMBOL_GPL(ufshcd_auto_hibern8_update);
>   
> -void ufshcd_auto_hibern8_enable(struct ufs_hba *hba)
> -{
> -	if (!ufshcd_is_auto_hibern8_supported(hba))
> -		return;
> -
> -	ufshcd_writel(hba, hba->ahit, REG_AUTO_HIBERNATE_IDLE_TIMER);
> -}
> -
>    /**
>    * ufshcd_init_pwr_info - setting the POR (power on reset)
>    * values in hba power info
> @@ -8815,8 +8815,7 @@ static int ufshcd_probe_hba(struct ufs_hba *hba, bool init_dev_params)
>   
>   	if (hba->ee_usr_mask)
>   		ufshcd_write_ee_control(hba);
> -	/* Enable Auto-Hibernate if configured */
> -	ufshcd_auto_hibern8_enable(hba);
> +	ufshcd_configure_auto_hibern8(hba);
>   
>   	ufshpb_toggle_state(hba, HPB_RESET, HPB_PRESENT);
>   out:
> @@ -9809,8 +9808,7 @@ static int __ufshcd_wl_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op)
>   		cancel_delayed_work(&hba->rpm_dev_flush_recheck_work);
>   	}
>   
> -	/* Enable Auto-Hibernate if configured */
> -	ufshcd_auto_hibern8_enable(hba);
> +	ufshcd_configure_auto_hibern8(hba);
Is it possible to have a race between sysfs and syspend/resume trying to 
update the auto_hibern8?

>   
>   	ufshpb_resume(hba);
>   	goto out;
> diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
> index 6dc11fa0ebb1..040d66d99912 100644
> --- a/include/ufs/ufshcd.h
> +++ b/include/ufs/ufshcd.h
> @@ -1363,7 +1363,6 @@ static inline int ufshcd_disable_host_tx_lcc(struct ufs_hba *hba)
>   	return ufshcd_dme_set(hba, UIC_ARG_MIB(PA_LOCAL_TX_LCC_ENABLE), 0);
>   }
>   
> -void ufshcd_auto_hibern8_enable(struct ufs_hba *hba);
>   void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
>   void ufshcd_fixup_dev_quirks(struct ufs_hba *hba,
>   			     const struct ufs_dev_quirk *fixups);


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 16/17] scsi: ufs: Forbid auto-hibernation without I/O scheduler
  2023-08-16 19:53 ` [PATCH v9 16/17] scsi: ufs: Forbid auto-hibernation without I/O scheduler Bart Van Assche
@ 2023-08-17 18:50   ` Bao D. Nguyen
  2023-08-17 19:18     ` Bart Van Assche
  0 siblings, 1 reply; 46+ messages in thread
From: Bao D. Nguyen @ 2023-08-17 18:50 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Can Guo, Avri Altman, Damien Le Moal, Ming Lei,
	James E.J. Bottomley, Bean Huo, Jinyoung Choi, Lu Hongfei,
	Daniil Lunev, Peter Wang, Stanley Chu, Manivannan Sadhasivam,
	Asutosh Das, Arthur Simchaev, zhanghui, Po-Wen Kao, Eric Biggers,
	Keoseong Park

On 8/16/2023 12:53 PM, Bart Van Assche wrote:
> UFSHCI 3.0 controllers do not preserve the write order if auto-hibernation
> is enabled. If the write order is not preserved, an I/O scheduler is
> required to serialize zoned writes. Hence do not allow auto-hibernation
> to be enabled without I/O scheduler if a zoned logical unit is present
> and if the controller is operating in legacy mode. This patch has been
> tested with the following shell script:
> 
>      show_ah8() {
>          echo -n "auto_hibern8: "
>          adb shell "cat /sys/devices/platform/13200000.ufs/auto_hibern8"
>      }
> 
>      set_ah8() {
>          local rc
>          adb shell "echo $1 > /sys/devices/platform/13200000.ufs/auto_hibern8"
>          rc=$?
>          show_ah8
>          return $rc
>      }
> 
>      set_iosched() {
>          adb shell "echo $1 >/sys/class/block/$zoned_bdev/queue/scheduler &&
>      	           echo -n 'I/O scheduler: ' &&
> 	           cat /sys/class/block/sde/queue/scheduler"
>      }
> 
>      adb root
>      zoned_bdev=$(adb shell grep -lvw 0 /sys/class/block/sd*/queue/chunk_sectors |&
> 	         sed 's|/sys/class/block/||g;s|/queue/chunk_sectors||g')
>      [ -n "$zoned_bdev" ]
>      show_ah8
>      set_ah8 0
>      set_iosched none
>      if set_ah8 150000; then
>          echo "Error: enabled AH8 without I/O scheduler"
>      fi
>      set_iosched mq-deadline
>      set_ah8 150000
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Can Guo <quic_cang@quicinc.com>
> Cc: Avri Altman <avri.altman@wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/ufs/core/ufs-sysfs.c   |  2 +-
>   drivers/ufs/core/ufshcd-priv.h |  1 -
>   drivers/ufs/core/ufshcd.c      | 60 ++++++++++++++++++++++++++++++++--
>   include/ufs/ufshcd.h           |  2 +-
>   4 files changed, 60 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufs-sysfs.c b/drivers/ufs/core/ufs-sysfs.c
> index 6c72075750dd..a693dea1bd18 100644
> --- a/drivers/ufs/core/ufs-sysfs.c
> +++ b/drivers/ufs/core/ufs-sysfs.c
> @@ -203,7 +203,7 @@ static ssize_t auto_hibern8_store(struct device *dev,
>   		goto out;
>   	}
>   
> -	ufshcd_auto_hibern8_update(hba, ufshcd_us_to_ahit(timer));
> +	ret = ufshcd_auto_hibern8_update(hba, ufshcd_us_to_ahit(timer));
>   
>   out:
>   	up(&hba->host_sem);
> diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd-priv.h
> index 0f3bd943b58b..a2b74fbc2056 100644
> --- a/drivers/ufs/core/ufshcd-priv.h
> +++ b/drivers/ufs/core/ufshcd-priv.h
> @@ -60,7 +60,6 @@ int ufshcd_query_attr(struct ufs_hba *hba, enum query_opcode opcode,
>   		      enum attr_idn idn, u8 index, u8 selector, u32 *attr_val);
>   int ufshcd_query_flag(struct ufs_hba *hba, enum query_opcode opcode,
>   	enum flag_idn idn, u8 index, bool *flag_res);
> -void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
>   void ufshcd_compl_one_cqe(struct ufs_hba *hba, int task_tag,
>   			  struct cq_entry *cqe);
>   int ufshcd_mcq_init(struct ufs_hba *hba);
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 39000c018d8b..37d430d20939 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -4337,6 +4337,29 @@ int ufshcd_uic_hibern8_exit(struct ufs_hba *hba)
>   }
>   EXPORT_SYMBOL_GPL(ufshcd_uic_hibern8_exit);
>   
> +static int ufshcd_update_preserves_write_order(struct ufs_hba *hba,
> +						bool preserves_write_order)
> +{
> +	struct scsi_device *sdev;
> +
> +	if (!preserves_write_order) {
> +		shost_for_each_device(sdev, hba->host) {
> +			struct request_queue *q = sdev->request_queue;
> +
> +			/*
> +			 * This code does not check whether the attached I/O
> +			 * scheduler serializes zoned writes
> +			 * (ELEVATOR_F_ZBD_SEQ_WRITE) because this cannot be
> +			 * checked from outside the block layer core.
> +			 */
> +			if (blk_queue_is_zoned(q) && !q->elevator)
> +				return -EPERM;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>   static void ufshcd_configure_auto_hibern8(struct ufs_hba *hba)
>   {
>   	if (!ufshcd_is_auto_hibern8_supported(hba))
> @@ -4345,13 +4368,37 @@ static void ufshcd_configure_auto_hibern8(struct ufs_hba *hba)
>   	ufshcd_writel(hba, hba->ahit, REG_AUTO_HIBERNATE_IDLE_TIMER);
>   }
>   
> -void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
> +/**
> + * ufshcd_auto_hibern8_update() - Modify the auto-hibernation control register
> + * @hba: per-adapter instance
> + * @ahit: New auto-hibernate settings. Includes the scale and the value of the
> + * auto-hibernation timer. See also the UFSHCI_AHIBERN8_TIMER_MASK and
> + * UFSHCI_AHIBERN8_SCALE_MASK constants.
> + *
> + * Note: enabling auto-hibernation if a zoned logical unit is present without
> + * attaching the mq-deadline scheduler first to the zoned logical unit may cause
> + * unaligned write errors for the zoned logical unit.
> + */
Please improve this "Note:". It seems like a run-on sentence and not 
very clear.


> +int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
>   {
>   	const u32 cur_ahit = READ_ONCE(hba->ahit);
> +	bool prev_state, new_state;
> +	int ret;
>   
>   	if (!ufshcd_is_auto_hibern8_supported(hba) || cur_ahit == ahit)
> -		return;
> +		return 0;
>   
> +	prev_state = FIELD_GET(UFSHCI_AHIBERN8_TIMER_MASK, cur_ahit);
> +	new_state = FIELD_GET(UFSHCI_AHIBERN8_TIMER_MASK, ahit);
> +
> +	if (!is_mcq_enabled(hba) && !prev_state && new_state) {
> +		/*
> +		 * Auto-hibernation will be enabled for legacy UFSHCI mode.
> +		 */
> +		ret = ufshcd_update_preserves_write_order(hba, false);
> +		if (ret)
> +			return ret;
> +	}
>   	WRITE_ONCE(hba->ahit, ahit);
>   	if (!pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
>   		ufshcd_rpm_get_sync(hba);
> @@ -4360,6 +4407,15 @@ void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
>   		ufshcd_release(hba);
>   		ufshcd_rpm_put_sync(hba);
>   	}
> +	if (!is_mcq_enabled(hba) && prev_state && !new_state) {
> +		/*
> +		 * Auto-hibernation has been disabled.
> +		 */
> +		ret = ufshcd_update_preserves_write_order(hba, true);
> +		WARN_ON_ONCE(ret);
> +	}
> +
> +	return 0;
>   }
>   EXPORT_SYMBOL_GPL(ufshcd_auto_hibern8_update);
>   
> diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
> index 040d66d99912..7ae071a6811c 100644
> --- a/include/ufs/ufshcd.h
> +++ b/include/ufs/ufshcd.h
> @@ -1363,7 +1363,7 @@ static inline int ufshcd_disable_host_tx_lcc(struct ufs_hba *hba)
>   	return ufshcd_dme_set(hba, UIC_ARG_MIB(PA_LOCAL_TX_LCC_ENABLE), 0);
>   }
>   
> -void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
> +int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit);
>   void ufshcd_fixup_dev_quirks(struct ufs_hba *hba,
>   			     const struct ufs_dev_quirk *fixups);
>   #define SD_ASCII_STD true


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 17/17] scsi: ufs: Inform the block layer about write ordering
  2023-08-16 19:53 ` [PATCH v9 17/17] scsi: ufs: Inform the block layer about write ordering Bart Van Assche
@ 2023-08-17 19:00   ` Bao D. Nguyen
  2023-08-17 19:34     ` Bart Van Assche
  0 siblings, 1 reply; 46+ messages in thread
From: Bao D. Nguyen @ 2023-08-17 19:00 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Can Guo, Avri Altman, Damien Le Moal, Ming Lei,
	James E.J. Bottomley, Stanley Chu, Bean Huo, Asutosh Das,
	Arthur Simchaev

On 8/16/2023 12:53 PM, Bart Van Assche wrote:
>  From the UFSHCI 4.0 specification, about the legacy (single queue) mode:
> "The host controller always process transfer requests in-order according
> to the order submitted to the list. In case of multiple commands with
> single doorbell register ringing (batch mode), The dispatch order for
> these transfer requests by host controller will base on their index in
> the List. A transfer request with lower index value will be executed
> before a transfer request with higher index value."
> 
>  From the UFSHCI 4.0 specification, about the MCQ mode:
> "Command Submission
> 1. Host SW writes an Entry to SQ
> 2. Host SW updates SQ doorbell tail pointer
> 
> Command Processing
> 3. After fetching the Entry, Host Controller updates SQ doorbell head
>     pointer
> 4. Host controller sends COMMAND UPIU to UFS device"
> 
> In other words, for both legacy and MCQ mode, UFS controllers are
> required to forward commands to the UFS device in the order these
> commands have been received from the host.
> 
> Notes:
> - For legacy mode this is only correct if the host submits one
>    command at a time. The UFS driver does this.
> - Also in legacy mode, the command order is not preserved if
>    auto-hibernation is enabled in the UFS controller. Hence, enable
>    zone write locking if auto-hibernation is enabled.
> 
> This patch improves performance as follows on my test setup:
> - With the mq-deadline scheduler: 2.5x more IOPS for small writes.
> - When not using an I/O scheduler compared to using mq-deadline with
>    zone locking: 4x more IOPS for small writes.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Can Guo <quic_cang@quicinc.com>
> Cc: Avri Altman <avri.altman@wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/ufs/core/ufshcd.c | 22 ++++++++++++++++++++--
>   1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 37d430d20939..7f5049a66cca 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -4356,6 +4356,19 @@ static int ufshcd_update_preserves_write_order(struct ufs_hba *hba,
>   				return -EPERM;
>   		}
>   	}
> +	shost_for_each_device(sdev, hba->host)
> +		blk_freeze_queue_start(sdev->request_queue);
> +	shost_for_each_device(sdev, hba->host) {
> +		struct request_queue *q = sdev->request_queue;
> +
> +		blk_mq_freeze_queue_wait(q);
> +		q->limits.driver_preserves_write_order = preserves_write_order;
> +		blk_queue_required_elevator_features(q,
> +			preserves_write_order ? 0 : ELEVATOR_F_ZBD_SEQ_WRITE);
> +		if (q->disk)
> +			disk_set_zoned(q->disk, q->limits.zoned);
> +		blk_mq_unfreeze_queue(q);
> +	}
>   
>   	return 0;
>   }
> @@ -4393,7 +4406,8 @@ int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
>   
>   	if (!is_mcq_enabled(hba) && !prev_state && new_state) {
>   		/*
> -		 * Auto-hibernation will be enabled for legacy UFSHCI mode.
> +		 * Auto-hibernation will be enabled for legacy UFSHCI mode. Tell
> +		 * the block layer that write requests may be reordered.
>   		 */
>   		ret = ufshcd_update_preserves_write_order(hba, false);
>   		if (ret)
> @@ -4409,7 +4423,8 @@ int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
>   	}
>   	if (!is_mcq_enabled(hba) && prev_state && !new_state) {
>   		/*
> -		 * Auto-hibernation has been disabled.
> +		 * Auto-hibernation has been disabled. Tell the block layer that
> +		 * the order of write requests is preserved.
>   		 */
>   		ret = ufshcd_update_preserves_write_order(hba, true);
>   		WARN_ON_ONCE(ret);
> @@ -5187,6 +5202,9 @@ static int ufshcd_slave_configure(struct scsi_device *sdev)
>   
>   	ufshcd_hpb_configure(hba, sdev);
>   
> +	q->limits.driver_preserves_write_order =
> +		!ufshcd_is_auto_hibern8_supported(hba) ||
> +		FIELD_GET(UFSHCI_AHIBERN8_TIMER_MASK, hba->ahit) == 0;
In legacy SDB mode with auto-hibernation enabled, the default setting 
for the driver_preserves_write_order = false.

Using the default setting, it may be missing this check that is part of 
the ufshcd_auto_hibern8_update()->ufshcd_update_preserves_write_order().

Auto-hibernation should not be enabled by default unless these 
conditions are met?
	if (blk_queue_is_zoned(q) && !q->elevator)
		return -EPERM;


>   	blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1);
>   	if (hba->quirks & UFSHCD_QUIRK_4KB_DMA_ALIGNMENT)
>   		blk_queue_update_dma_alignment(q, SZ_4K - 1);


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 12/17] scsi: ufs: mediatek: Rework the code for disabling auto-hibernation
  2023-08-17 18:40   ` Bao D. Nguyen
@ 2023-08-17 19:13     ` Bart Van Assche
  0 siblings, 0 replies; 46+ messages in thread
From: Bart Van Assche @ 2023-08-17 19:13 UTC (permalink / raw)
  To: Bao D. Nguyen, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Stanley Chu, James E.J. Bottomley, Matthias Brugger

On 8/17/23 11:40, Bao D. Nguyen wrote:
> On 8/16/2023 12:53 PM, Bart Van Assche wrote:
>> diff --git a/drivers/ufs/host/ufs-mediatek.c b/drivers/ufs/host/ufs-mediatek.c
>> index e68b05976f9e..a3cf30e603ca 100644
>> --- a/drivers/ufs/host/ufs-mediatek.c
>> +++ b/drivers/ufs/host/ufs-mediatek.c
>> @@ -1252,7 +1252,7 @@ static void ufs_mtk_auto_hibern8_disable(struct ufs_hba *hba)
>>       int ret;
>>       /* disable auto-hibern8 */
>> -    ufshcd_writel(hba, 0, REG_AUTO_HIBERNATE_IDLE_TIMER);
>> +    ufshcd_auto_hibern8_update(hba, 0);
> 
> Since you now use ufshcd_auto_hibern8_update(), the caller should not need to check for ufshcd_is_auto_hibern8_supported() because this is already part of the hibern8_update(). Suggest remove the if statement from the caller.
 >
> 
>      if (ufshcd_is_auto_hibern8_supported(hba))
>          ufs_mtk_auto_hibern8_disable(hba);
> 
>>       /* wait host return to idle state when auto-hibern8 off */
>>       ufs_mtk_wait_idle_state(hba, 5);

I think that check in the caller is still useful to skip the wait loop in
ufs_mtk_auto_hibern8_disable() if auto-hibernation is not supported.

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 14/17] scsi: ufs: Rename ufshcd_auto_hibern8_enable() and make it static
  2023-08-17 18:49   ` Bao D. Nguyen
@ 2023-08-17 19:16     ` Bart Van Assche
  2023-08-17 21:48       ` Bao D. Nguyen
  0 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-17 19:16 UTC (permalink / raw)
  To: Bao D. Nguyen, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	James E.J. Bottomley, Stanley Chu, Can Guo, Bean Huo,
	Asutosh Das, Arthur Simchaev, Manivannan Sadhasivam, Po-Wen Kao,
	Eric Biggers, Keoseong Park, Daniil Lunev

On 8/17/23 11:49, Bao D. Nguyen wrote:
> On 8/16/2023 12:53 PM, Bart Van Assche wrote:
>> -    /* Enable Auto-Hibernate if configured */
>> -    ufshcd_auto_hibern8_enable(hba);
>> +    ufshcd_configure_auto_hibern8(hba);
>
> Is it possible to have a race between sysfs and syspend/resume trying to update the auto_hibern8?

Only user-space software should write into sysfs. Kernel code should
not do this. User-space code is paused before the block driver suspend
callbacks are invoked and resuming block devices completes before
user space software is resumed. I don't think that sysfs writes can
race with suspend or resume callbacks.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 16/17] scsi: ufs: Forbid auto-hibernation without I/O scheduler
  2023-08-17 18:50   ` Bao D. Nguyen
@ 2023-08-17 19:18     ` Bart Van Assche
  0 siblings, 0 replies; 46+ messages in thread
From: Bart Van Assche @ 2023-08-17 19:18 UTC (permalink / raw)
  To: Bao D. Nguyen, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Can Guo, Avri Altman, Damien Le Moal, Ming Lei,
	James E.J. Bottomley, Bean Huo, Jinyoung Choi, Lu Hongfei,
	Daniil Lunev, Peter Wang, Stanley Chu, Manivannan Sadhasivam,
	Asutosh Das, Arthur Simchaev, zhanghui, Po-Wen Kao, Eric Biggers,
	Keoseong Park

On 8/17/23 11:50, Bao D. Nguyen wrote:
> On 8/16/2023 12:53 PM, Bart Van Assche wrote:
>> +/**
>> + * ufshcd_auto_hibern8_update() - Modify the auto-hibernation control register
>> + * @hba: per-adapter instance
>> + * @ahit: New auto-hibernate settings. Includes the scale and the value of the
>> + * auto-hibernation timer. See also the UFSHCI_AHIBERN8_TIMER_MASK and
>> + * UFSHCI_AHIBERN8_SCALE_MASK constants.
>> + *
>> + * Note: enabling auto-hibernation if a zoned logical unit is present without
>> + * attaching the mq-deadline scheduler first to the zoned logical unit may cause
>> + * unaligned write errors for the zoned logical unit.
>> + */
>
> Please improve this "Note:". It seems like a run-on sentence and not very clear.

There are no grammatical errors in that note. Anyway, I will make the note
more clear.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 17/17] scsi: ufs: Inform the block layer about write ordering
  2023-08-17 19:00   ` Bao D. Nguyen
@ 2023-08-17 19:34     ` Bart Van Assche
  2023-08-17 21:47       ` Bao D. Nguyen
  0 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-17 19:34 UTC (permalink / raw)
  To: Bao D. Nguyen, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Can Guo, Avri Altman, Damien Le Moal, Ming Lei,
	James E.J. Bottomley, Stanley Chu, Bean Huo, Asutosh Das,
	Arthur Simchaev

On 8/17/23 12:00, Bao D. Nguyen wrote:
> In legacy SDB mode with auto-hibernation enabled, the default setting for the
> driver_preserves_write_order = false.
 >
> Using the default setting, it may be missing this check that is part of the ufshcd_auto_hibern8_update()->ufshcd_update_preserves_write_order().

If auto-hibernation is enabled by the host driver, driver_preserves_write_order
is set by the following code in ufshcd_slave_configure():

	q->limits.driver_preserves_write_order =
		!ufshcd_is_auto_hibern8_supported(hba) ||
		FIELD_GET(UFSHCI_AHIBERN8_TIMER_MASK, hba->ahit) == 0;

Does this answer your question?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 17/17] scsi: ufs: Inform the block layer about write ordering
  2023-08-17 19:34     ` Bart Van Assche
@ 2023-08-17 21:47       ` Bao D. Nguyen
  2023-08-17 22:05         ` Bart Van Assche
  0 siblings, 1 reply; 46+ messages in thread
From: Bao D. Nguyen @ 2023-08-17 21:47 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Can Guo, Avri Altman, Damien Le Moal, Ming Lei,
	James E.J. Bottomley, Stanley Chu, Bean Huo, Asutosh Das,
	Arthur Simchaev

On 8/17/2023 12:34 PM, Bart Van Assche wrote:
> On 8/17/23 12:00, Bao D. Nguyen wrote:
>> In legacy SDB mode with auto-hibernation enabled, the default setting 
>> for the
>> driver_preserves_write_order = false.
>  >
>> Using the default setting, it may be missing this check that is part 
>> of the 
>> ufshcd_auto_hibern8_update()->ufshcd_update_preserves_write_order().
> 
> If auto-hibernation is enabled by the host driver, 
> driver_preserves_write_order
> is set by the following code in ufshcd_slave_configure():
> 
>      q->limits.driver_preserves_write_order =
>          !ufshcd_is_auto_hibern8_supported(hba) ||
>          FIELD_GET(UFSHCI_AHIBERN8_TIMER_MASK, hba->ahit) == 0;
> 
> Does this answer your question?
Hi Bart,
My concern is that in the ufshcd_update_preserves_write_order() you have 
this logic:

	if (!preserves_write_order) {
		shost_for_each_device(sdev, hba->host) {
			struct request_queue *q = sdev->request_queue;
			/*...*/
			if (blk_queue_is_zoned(q) && !q->elevator)
				return -EPERM;
		}
	}

The above logic is only invoked when ufshcd_auto_hibern8_update() is 
called. During initialization, ufshcd_auto_hibern8_update() is not 
called. Therefore, you may have SDB mode with auto hibernate enabled -> 
preserves_write_order = false, and (blk_queue_is_zoned(q) && 
!q->elevator) is true. Is that a problem? If you called 
ufshcd_auto_hibern8_update() during ufs probe, you would have returned 
-EPERM and not enable auto-hibernation in that case.

Thanks,
Bao


> 
> Thanks,
> 
> Bart.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 14/17] scsi: ufs: Rename ufshcd_auto_hibern8_enable() and make it static
  2023-08-17 19:16     ` Bart Van Assche
@ 2023-08-17 21:48       ` Bao D. Nguyen
  0 siblings, 0 replies; 46+ messages in thread
From: Bao D. Nguyen @ 2023-08-17 21:48 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	James E.J. Bottomley, Stanley Chu, Can Guo, Bean Huo,
	Asutosh Das, Arthur Simchaev, Manivannan Sadhasivam, Po-Wen Kao,
	Eric Biggers, Keoseong Park, Daniil Lunev

On 8/17/2023 12:16 PM, Bart Van Assche wrote:
> On 8/17/23 11:49, Bao D. Nguyen wrote:
>> On 8/16/2023 12:53 PM, Bart Van Assche wrote:
>>> -    /* Enable Auto-Hibernate if configured */
>>> -    ufshcd_auto_hibern8_enable(hba);
>>> +    ufshcd_configure_auto_hibern8(hba);
>>
>> Is it possible to have a race between sysfs and syspend/resume trying 
>> to update the auto_hibern8?
> 
> Only user-space software should write into sysfs. Kernel code should
> not do this. User-space code is paused before the block driver suspend
> callbacks are invoked and resuming block devices completes before
> user space software is resumed. I don't think that sysfs writes can
> race with suspend or resume callbacks.
>Thank you for the explanation.

Thanks
Bao
> Thanks,
> 
> Bart.
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 17/17] scsi: ufs: Inform the block layer about write ordering
  2023-08-17 21:47       ` Bao D. Nguyen
@ 2023-08-17 22:05         ` Bart Van Assche
  2023-08-18  0:19           ` Bao D. Nguyen
  0 siblings, 1 reply; 46+ messages in thread
From: Bart Van Assche @ 2023-08-17 22:05 UTC (permalink / raw)
  To: Bao D. Nguyen, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Can Guo, Avri Altman, Damien Le Moal, Ming Lei,
	James E.J. Bottomley, Stanley Chu, Bean Huo, Asutosh Das,
	Arthur Simchaev

On 8/17/23 14:47, Bao D. Nguyen wrote:
> During initialization, ufshcd_auto_hibern8_update() is not called. Therefore,
> you may have SDB mode with auto hibernate enabled -> preserves_write_order = false, [ ... ]

Hi Bao,

ufshcd_slave_configure() is called before any SCSI commands are submitted to a
logical unit. ufshcd_slave_configure() sets 'preserves_write_order' depending on
the value of hba->ahit. Does this answer your question?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 17/17] scsi: ufs: Inform the block layer about write ordering
  2023-08-17 22:05         ` Bart Van Assche
@ 2023-08-18  0:19           ` Bao D. Nguyen
  2023-08-18 17:56             ` Bart Van Assche
  0 siblings, 1 reply; 46+ messages in thread
From: Bao D. Nguyen @ 2023-08-18  0:19 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Can Guo, Avri Altman, Damien Le Moal, Ming Lei,
	James E.J. Bottomley, Stanley Chu, Bean Huo, Asutosh Das,
	Arthur Simchaev

On 8/17/2023 3:05 PM, Bart Van Assche wrote:
> On 8/17/23 14:47, Bao D. Nguyen wrote:
>> During initialization, ufshcd_auto_hibern8_update() is not called. 
>> Therefore,
>> you may have SDB mode with auto hibernate enabled -> 
>> preserves_write_order = false, [ ... ]
> 
> Hi Bao,
> 
> ufshcd_slave_configure() is called before any SCSI commands are 
> submitted to a
> logical unit. ufshcd_slave_configure() sets 'preserves_write_order' 
> depending on
> the value of hba->ahit. Does this answer your question?

Sorry Bart. Not yet :-) Please let me try to explain myself again.

For example, in SDB mode, after the probe and you want to enable 
auto-hibern8, you would call ufshcd_auto_hibern8_update() which then calls
ufshcd_update_preserves_write_order(). Before auto-hibern8 is enabled, 
you would check this condition:
	if (blk_queue_is_zoned(q) && !q->elevator)

In other words, auto-hibern8 is enabled only if the above condition false.

However, the during a normal operation, the ufshcd_auto_hibern8_update() 
may not get called at all, and auto-hibern8 can be enabled in SDB mode 
as part of ufs init. Would that be a problem to have auto-hibern8 
enabled without checking whether the above condition is false?

Thanks
Bao

> 
> Thanks,
> 
> Bart.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 04/17] scsi: core: Call .eh_prepare_resubmit() before resubmitting
  2023-08-17 14:26     ` Bart Van Assche
@ 2023-08-18  2:38       ` Damien Le Moal
  0 siblings, 0 replies; 46+ messages in thread
From: Damien Le Moal @ 2023-08-18  2:38 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, James E.J. Bottomley

On 2023/08/17 23:26, Bart Van Assche wrote:
> On 8/17/23 04:10, Damien Le Moal wrote:
>> On 8/17/23 04:53, Bart Van Assche wrote:
>>> +/*
>>> + * Returns true if the commands in @done_q should be sorted in LBA order
>>> + * before being resubmitted.
>>> + */
>>> +static bool scsi_needs_sorting(struct list_head *done_q)
>>> +{
>>> +	struct scsi_cmnd *scmd;
>>> +
>>> +	list_for_each_entry(scmd, done_q, eh_entry) {
>>> +		struct request *rq = scsi_cmd_to_rq(scmd);
>>> +
>>> +		if (!rq->q->limits.use_zone_write_lock &&
>>> +		    blk_rq_is_seq_zoned_write(rq))
>>> +			return true;
>>> +	}
>>> +
>>> +	return false;
>>> +}
>>> +
>>> +/*
>>> + * Comparison function that allows to sort SCSI commands by ULD driver.
>>> + */
>>> +static int scsi_cmp_uld(void *priv, const struct list_head *_a,
>>> +			const struct list_head *_b)
>>> +{
>>> +	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
>>> +	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
>>> +
>>> +	/* See also the comment above the list_sort() definition. */
>>> +	return scsi_cmd_to_driver(a) > scsi_cmd_to_driver(b);
>>> +}
>>> +
>>> +void scsi_call_prepare_resubmit(struct list_head *done_q)
>>> +{
>>> +	struct scsi_cmnd *scmd, *next;
>>> +
>>> +	if (!scsi_needs_sorting(done_q))
>>> +		return;
>>
>> This is strange. The eh_prepare_resubmit callback is generic and its name does
>> not indicate anything related to sorting by LBAs. So this check would prevent
>> other actions not related to sorting by LBA. This should go away.
>>
>> In patch 6, based on the device characteristics, the sd driver should decides
>> if it needs to set .eh_prepare_resubmit or not.
>>
>> And ideally, if all ULDs have eh_prepare_resubmit == NULL, this function should
>> return here before going through the list of commands to resubmit. Given that
>> this list should generally be small, going through it and doing nothing should
>> be OK though...
> 
> I can add a eh_prepare_resubmit == NULL check early in 
> scsi_call_prepare_resubmit(). Regarding the code inside 
> scsi_needs_sorting(), how about moving that code into an additional 
> callback function, e.g. eh_needs_prepare_resubmit? Setting 
> .eh_prepare_resubmit depending on the zone model would prevent 
> constification of struct scsi_driver.

Sounds OK.

> 
> Thanks,
> 
> Bart.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 06/17] scsi: sd: Sort commands by LBA before resubmitting
  2023-08-17 14:34     ` Bart Van Assche
@ 2023-08-18  2:41       ` Damien Le Moal
  0 siblings, 0 replies; 46+ messages in thread
From: Damien Le Moal @ 2023-08-18  2:41 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, James E.J. Bottomley

On 2023/08/17 23:34, Bart Van Assche wrote:
> On 8/17/23 04:13, Damien Le Moal wrote:
>> On 8/17/23 04:53, Bart Van Assche wrote:
>>> +static int sd_cmp_sector(void *priv, const struct list_head *_a,
>>> +			 const struct list_head *_b)
>>> +{
>>> +	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
>>> +	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
>>> +	struct request *rq_a = scsi_cmd_to_rq(a);
>>> +	struct request *rq_b = scsi_cmd_to_rq(b);
>>> +	bool use_zwl_a = rq_a->q->limits.use_zone_write_lock;
>>> +	bool use_zwl_b = rq_b->q->limits.use_zone_write_lock;
>>> +
>>> +	/*
>>> +	 * Order the commands that need zone write locking after the commands
>>> +	 * that do not need zone write locking. Order the commands that do not
>>> +	 * need zone write locking by LBA. Do not reorder the commands that
>>> +	 * need zone write locking. See also the comment above the list_sort()
>>> +	 * definition.
>>> +	 */
>>> +	if (use_zwl_a || use_zwl_b)
>>> +		return use_zwl_a > use_zwl_b;
>>> +	return blk_rq_pos(rq_a) > blk_rq_pos(rq_b);
>>> +}
>>> +
>>> +static void sd_prepare_resubmit(struct list_head *cmd_list)
>>> +{
>>> +	/*
>>> +	 * Sort pending SCSI commands in starting sector order. This is
>>> +	 * important if one of the SCSI devices associated with @shost is a
>>> +	 * zoned block device for which zone write locking is disabled.
>>> +	 */
>>> +	list_sort(NULL, cmd_list, sd_cmp_sector);
>>
>> We should not need to do this for regular devices or zoned devices using zone
>> write locking. So returning doing nothing would be better but the callback
>> lacks a pointer to the scsi_device to test that.
> 
> @cmd_list may include SCSI commands for multiple SCSI devices so I'm not 
> sure which SCSI device pointer you are referring to in the above comment?

This is called from scsi_eh_flush_done_q() which also does not have the scsi
device pointer. And places where this is called have the scsi host pointer, not
the device. OK. So let's no do that.

> 
> Thanks,
> 
> Bart.
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 17/17] scsi: ufs: Inform the block layer about write ordering
  2023-08-18  0:19           ` Bao D. Nguyen
@ 2023-08-18 17:56             ` Bart Van Assche
  0 siblings, 0 replies; 46+ messages in thread
From: Bart Van Assche @ 2023-08-18 17:56 UTC (permalink / raw)
  To: Bao D. Nguyen, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Can Guo, Avri Altman, Damien Le Moal, Ming Lei,
	James E.J. Bottomley, Stanley Chu, Bean Huo, Asutosh Das,
	Arthur Simchaev

On 8/17/23 17:19, Bao D. Nguyen wrote:
> For example, in SDB mode, after the probe and you want to enable auto-hibern8, you would call ufshcd_auto_hibern8_update() which then calls
> ufshcd_update_preserves_write_order(). Before auto-hibern8 is enabled, you would check this condition:
>      if (blk_queue_is_zoned(q) && !q->elevator)
> 
> In other words, auto-hibern8 is enabled only if the above condition false.
> 
> However, the during a normal operation, the ufshcd_auto_hibern8_update() may not get called at all, and auto-hibern8 can be enabled in SDB mode as part of ufs init. Would that be a problem to have auto-hibern8 enabled without checking whether the above condition is false?

Hi Bao,

Probing the host controller happens as follows:
* The host controller probing function is called, e.g. ufs_qcom_probe().
* ufshcd_alloc_host() and ufshcd_init() are called directly or indirectly
   by the host controller probe function.
* ufshcd_init() calls scsi_add_host() and async_schedule(ufshcd_async_scan, hba).

Once ufshcd_async_scan() is called asynchronously, it performs the
following actions:
* ufs_probe_hba() is called. This function calls ufshcd_link_startup()
   indirectly. That function invokes the link_startup_notify vop if it has
   been defined. Some host drivers set hba->ahit from inside that vop.
* scsi_scan_host() is called and calls scsi_alloc_sdev() indirectly.
* scsi_alloc_sdev() calls blk_mq_init_queue() and shost->hostt->slave_alloc().
* scsi_add_lun() calls sdev->host->hostt->slave_configure().
* scsi_add_lun() calls scsi_sysfs_add_sdev(). This indirectly causes sd_probe()
   to be called asynchronously.
* sd_probe() calls sd_revalidate_disk(). This results in an indirect call of
   disk_set_zoned(). Additionally, the ELEVATOR_F_ZBD_SEQ_WRITE flag is set by
   the sd driver if q->limits.driver_preserves_write_order has not been set.
* Still from inside sd_probe(), device_add_disk() is called and a scheduler
   is selected based on the value of q->required_elevator_features.

There are two ways in which host drivers initialize auto-hibernation:
* Either by setting hba->ahit before ufshcd_init() is called.
* Or by calling ufshcd_auto_hibern8_update() from the link_startup_notify
   vop.

I think the above shows that the zoned model is queried *after* the above methods
for enabling auto-hibernation have completed and hence that blk_queue_is_zoned()
evaluates to false if a host driver calls ufshcd_auto_hibern8_update() from
inside the link_startup_notify vop.

If auto-hibernation is enabled from inside the host driver then that should
happen before sd_probe() is called. sd_probe() will use the value of
q->limits.driver_preserves_write_order that has been set by
ufshcd_slave_configure() so there is no risk for inconsistencies between the
auto-hibernation configuration and the request queue configuration.

Bart.




^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2023-08-18 17:57 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-16 19:53 [PATCH v9 00/17] Improve performance for zoned UFS devices Bart Van Assche
2023-08-16 19:53 ` [PATCH v9 01/17] block: Introduce more member variables related to zone write locking Bart Van Assche
2023-08-17 11:00   ` Damien Le Moal
2023-08-16 19:53 ` [PATCH v9 02/17] block: Only use write locking if necessary Bart Van Assche
2023-08-17 11:01   ` Damien Le Moal
2023-08-17 14:21     ` Bart Van Assche
2023-08-16 19:53 ` [PATCH v9 03/17] block/mq-deadline: Only use zone " Bart Van Assche
2023-08-17 11:02   ` Damien Le Moal
2023-08-16 19:53 ` [PATCH v9 04/17] scsi: core: Call .eh_prepare_resubmit() before resubmitting Bart Van Assche
2023-08-17 11:10   ` Damien Le Moal
2023-08-17 14:26     ` Bart Van Assche
2023-08-18  2:38       ` Damien Le Moal
2023-08-16 19:53 ` [PATCH v9 05/17] scsi: core: Add unit tests for scsi_call_prepare_resubmit() Bart Van Assche
2023-08-16 19:53 ` [PATCH v9 06/17] scsi: sd: Sort commands by LBA before resubmitting Bart Van Assche
2023-08-17 11:13   ` Damien Le Moal
2023-08-17 14:34     ` Bart Van Assche
2023-08-18  2:41       ` Damien Le Moal
2023-08-16 19:53 ` [PATCH v9 07/17] scsi: core: Retry unaligned zoned writes Bart Van Assche
2023-08-17 11:21   ` Damien Le Moal
2023-08-16 19:53 ` [PATCH v9 08/17] scsi: sd_zbc: Only require an I/O scheduler if needed Bart Van Assche
2023-08-17 11:21   ` Damien Le Moal
2023-08-16 19:53 ` [PATCH v9 09/17] scsi: scsi_debug: Support disabling zone write locking Bart Van Assche
2023-08-17 11:25   ` Damien Le Moal
2023-08-17 14:35     ` Bart Van Assche
2023-08-16 19:53 ` [PATCH v9 10/17] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
2023-08-17 11:29   ` Damien Le Moal
2023-08-16 19:53 ` [PATCH v9 11/17] scsi: ufs: hisi: Rework the code that disables auto-hibernation Bart Van Assche
2023-08-16 19:53 ` [PATCH v9 12/17] scsi: ufs: mediatek: Rework the code for disabling auto-hibernation Bart Van Assche
2023-08-17 18:40   ` Bao D. Nguyen
2023-08-17 19:13     ` Bart Van Assche
2023-08-16 19:53 ` [PATCH v9 13/17] scsi: ufs: sprd: " Bart Van Assche
2023-08-16 19:53 ` [PATCH v9 14/17] scsi: ufs: Rename ufshcd_auto_hibern8_enable() and make it static Bart Van Assche
2023-08-17 18:49   ` Bao D. Nguyen
2023-08-17 19:16     ` Bart Van Assche
2023-08-17 21:48       ` Bao D. Nguyen
2023-08-16 19:53 ` [PATCH v9 15/17] scsi: ufs: Simplify ufshcd_auto_hibern8_update() Bart Van Assche
2023-08-16 19:53 ` [PATCH v9 16/17] scsi: ufs: Forbid auto-hibernation without I/O scheduler Bart Van Assche
2023-08-17 18:50   ` Bao D. Nguyen
2023-08-17 19:18     ` Bart Van Assche
2023-08-16 19:53 ` [PATCH v9 17/17] scsi: ufs: Inform the block layer about write ordering Bart Van Assche
2023-08-17 19:00   ` Bao D. Nguyen
2023-08-17 19:34     ` Bart Van Assche
2023-08-17 21:47       ` Bao D. Nguyen
2023-08-17 22:05         ` Bart Van Assche
2023-08-18  0:19           ` Bao D. Nguyen
2023-08-18 17:56             ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.