All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/7] Improve the performance of F2FS on zoned UFS
@ 2023-07-26 19:34 Bart Van Assche
  2023-07-26 19:34 ` [PATCH v4 1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK Bart Van Assche
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Bart Van Assche @ 2023-07-26 19:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Damien Le Moal, Bart Van Assche

Hi Jens,

This patch series improves small write IOPS by a factor of four (+300%) for
zoned UFS devices on my test setup with a UFSHCI 3.0 controller. Please
consider this patch series for the next merge window.

Thank you,

Bart.

Changes compared to v3:
 - Restored the patch that introduces QUEUE_FLAG_NO_ZONE_WRITE_LOCK. That patch
   had accidentally been left out from v2.
 - In patch "block: Introduce the flag REQ_NO_ZONE_WRITE_LOCK", improved the
   patch description and added the function blk_no_zone_write_lock().
 - In patch "block/mq-deadline: Only use zone locking if necessary", moved the
   blk_queue_is_zoned() call into dd_use_write_locking().
 - In patch "fs/f2fs: Disable zone write locking", set REQ_NO_ZONE_WRITE_LOCK
   from inside __bio_alloc() instead of in f2fs_submit_write_bio().

Changes compared to v2:
 - Renamed the request queue flag for disabling zone write locking.
 - Introduced a new request flag for disabling zone write locking.
 - Modified the mq-deadline scheduler such that zone write locking is only
   disabled if both flags are set.
 - Added an F2FS patch that sets the request flag for disabling zone write
   locking.
 - Only disable zone write locking in the UFS driver if auto-hibernation is
   disabled.

Changes compared to v1:
 - Left out the patches that are already upstream.
 - Switched the approach in patch "scsi: Retry unaligned zoned writes" from
   retrying immediately to sending unaligned write commands to the SCSI error
   handler.


Bart Van Assche (7):
  block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK
  block: Introduce the flag REQ_NO_ZONE_WRITE_LOCK
  block/mq-deadline: Only use zone locking if necessary
  block/null_blk: Support disabling zone write locking
  scsi: Retry unaligned zoned writes
  scsi: ufs: Disable zone write locking
  fs/f2fs: Disable zone write locking

 block/blk-flush.c                 |  3 ++-
 block/mq-deadline.c               | 25 ++++++++++++-----
 drivers/block/null_blk/main.c     |  2 ++
 drivers/block/null_blk/null_blk.h |  1 +
 drivers/block/null_blk/zoned.c    |  3 +++
 drivers/scsi/scsi_error.c         | 38 ++++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c           |  1 +
 drivers/scsi/sd.c                 |  2 ++
 drivers/ufs/core/ufshcd.c         | 45 ++++++++++++++++++++++++++++---
 fs/f2fs/data.c                    |  4 +--
 include/linux/blk-mq.h            | 11 ++++++++
 include/linux/blk_types.h         |  8 ++++++
 include/linux/blkdev.h            | 10 +++++++
 include/scsi/scsi.h               |  1 +
 14 files changed, 142 insertions(+), 12 deletions(-)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK
  2023-07-26 19:34 [PATCH v4 0/7] Improve the performance of F2FS on zoned UFS Bart Van Assche
@ 2023-07-26 19:34 ` Bart Van Assche
  2023-07-26 23:49   ` Damien Le Moal
                     ` (2 more replies)
  2023-07-26 19:34 ` [PATCH v4 2/7] block: Introduce the flag REQ_NO_ZONE_WRITE_LOCK Bart Van Assche
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 27+ messages in thread
From: Bart Van Assche @ 2023-07-26 19:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Damien Le Moal, Bart Van Assche, Ming Lei

Writes in sequential write required zones must happen at the write
pointer. Even if the submitter of the write commands (e.g. a filesystem)
submits writes for sequential write required zones in order, the block
layer or the storage controller may reorder these write commands.

The zone locking mechanism in the mq-deadline I/O scheduler serializes
write commands for sequential zones. Some but not all storage controllers
require this serialization. Introduce a new flag such that block drivers
can request that zone write locking is disabled.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 include/linux/blkdev.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2f5371b8482c..066ac395f62f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -534,6 +534,11 @@ struct request_queue {
 #define QUEUE_FLAG_NONROT	6	/* non-rotational device (SSD) */
 #define QUEUE_FLAG_VIRT		QUEUE_FLAG_NONROT /* paravirt device */
 #define QUEUE_FLAG_IO_STAT	7	/* do disk/partitions IO accounting */
+/*
+ * Do not use the zone write lock for sequential writes for sequential write
+ * required zones.
+ */
+#define QUEUE_FLAG_NO_ZONE_WRITE_LOCK 8
 #define QUEUE_FLAG_NOXMERGES	9	/* No extended merges */
 #define QUEUE_FLAG_ADD_RANDOM	10	/* Contributes to random pool */
 #define QUEUE_FLAG_SYNCHRONOUS	11	/* always completes in submit context */
@@ -597,6 +602,11 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_skip_tagset_quiesce(q) \
 	test_bit(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, &(q)->queue_flags)
 
+static inline bool blk_queue_no_zone_write_lock(struct request_queue *q)
+{
+	return test_bit(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, &q->queue_flags);
+}
+
 extern void blk_set_pm_only(struct request_queue *q);
 extern void blk_clear_pm_only(struct request_queue *q);
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 2/7] block: Introduce the flag REQ_NO_ZONE_WRITE_LOCK
  2023-07-26 19:34 [PATCH v4 0/7] Improve the performance of F2FS on zoned UFS Bart Van Assche
  2023-07-26 19:34 ` [PATCH v4 1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK Bart Van Assche
@ 2023-07-26 19:34 ` Bart Van Assche
  2023-07-27  0:03   ` Damien Le Moal
  2023-07-26 19:34 ` [PATCH v4 3/7] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2023-07-26 19:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Damien Le Moal, Bart Van Assche, Ming Lei

Not all software that supports zoned storage allocates and submits zoned
writes in LBA order per zone. Introduce the REQ_NO_WRITE_LOCK flag such
that submitters of zoned writes can indicate that zoned writes are
allocated and submitted in LBA order per zone.

Introduce the blk_no_zone_write_lock() function to make it easy to test
whether both QUEUE_FLAG_NO_ZONE_WRITE_LOCK and REQ_NO_ZONE_WRITE_LOCK
are set.

Make flush requests inherit the REQ_NO_ZONE_WRITE_LOCK flag from the
request they are derived from.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-flush.c         |  3 ++-
 include/linux/blk-mq.h    | 11 +++++++++++
 include/linux/blk_types.h |  8 ++++++++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index e73dc22d05c1..038350ed7cce 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -336,7 +336,8 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
 		flush_rq->internal_tag = first_rq->internal_tag;
 
 	flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
-	flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK);
+	flush_rq->cmd_flags |= flags &
+		(REQ_FAILFAST_MASK | REQ_NO_ZONE_WRITE_LOCK | REQ_DRV);
 	flush_rq->rq_flags |= RQF_FLUSH_SEQ;
 	flush_rq->end_io = flush_end_io;
 	/*
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 01e8c31db665..dc8ec26ba2d0 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -1170,6 +1170,12 @@ static inline bool blk_rq_is_seq_zoned_write(struct request *rq)
 		blk_rq_zone_is_seq(rq);
 }
 
+static inline bool blk_no_zone_write_lock(struct request *rq)
+{
+	return blk_queue_no_zone_write_lock(rq->q) &&
+		rq->cmd_flags & REQ_NO_ZONE_WRITE_LOCK;
+}
+
 bool blk_req_needs_zone_write_lock(struct request *rq);
 bool blk_req_zone_write_trylock(struct request *rq);
 void __blk_req_zone_write_lock(struct request *rq);
@@ -1205,6 +1211,11 @@ static inline bool blk_rq_is_seq_zoned_write(struct request *rq)
 	return false;
 }
 
+static inline bool blk_no_zone_write_lock(struct request *rq)
+{
+	return true;
+}
+
 static inline bool blk_req_needs_zone_write_lock(struct request *rq)
 {
 	return false;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 0bad62cca3d0..68f7934d8fa6 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -416,6 +416,12 @@ enum req_flag_bits {
 	__REQ_PREFLUSH,		/* request for cache flush */
 	__REQ_RAHEAD,		/* read ahead, can fail anytime */
 	__REQ_BACKGROUND,	/* background IO */
+	/*
+	 * Do not use zone write locking. Setting this flag is only safe if
+	 * the request submitter allocates and submit requests in LBA order per
+	 * zone.
+	 */
+	__REQ_NO_ZONE_WRITE_LOCK,
 	__REQ_NOWAIT,           /* Don't wait if request will block */
 	__REQ_POLLED,		/* caller polls for completion using bio_poll */
 	__REQ_ALLOC_CACHE,	/* allocate IO from cache if available */
@@ -448,6 +454,8 @@ enum req_flag_bits {
 #define REQ_PREFLUSH	(__force blk_opf_t)(1ULL << __REQ_PREFLUSH)
 #define REQ_RAHEAD	(__force blk_opf_t)(1ULL << __REQ_RAHEAD)
 #define REQ_BACKGROUND	(__force blk_opf_t)(1ULL << __REQ_BACKGROUND)
+#define REQ_NO_ZONE_WRITE_LOCK	\
+			(__force blk_opf_t)(1ULL << __REQ_NO_ZONE_WRITE_LOCK)
 #define REQ_NOWAIT	(__force blk_opf_t)(1ULL << __REQ_NOWAIT)
 #define REQ_POLLED	(__force blk_opf_t)(1ULL << __REQ_POLLED)
 #define REQ_ALLOC_CACHE	(__force blk_opf_t)(1ULL << __REQ_ALLOC_CACHE)

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 3/7] block/mq-deadline: Only use zone locking if necessary
  2023-07-26 19:34 [PATCH v4 0/7] Improve the performance of F2FS on zoned UFS Bart Van Assche
  2023-07-26 19:34 ` [PATCH v4 1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK Bart Van Assche
  2023-07-26 19:34 ` [PATCH v4 2/7] block: Introduce the flag REQ_NO_ZONE_WRITE_LOCK Bart Van Assche
@ 2023-07-26 19:34 ` Bart Van Assche
  2023-07-27  0:09   ` Damien Le Moal
  2023-07-26 19:34 ` [PATCH v4 4/7] block/null_blk: Support disabling zone write locking Bart Van Assche
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2023-07-26 19:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Damien Le Moal, Bart Van Assche, Ming Lei

Measurements have shown that limiting the queue depth to one for zoned
writes has a significant negative performance impact on zoned UFS devices.
Hence this patch that disables zone locking by the mq-deadline scheduler
if zoned writes are submitted in order and if the storage controller
preserves the command order. This patch is based on the following
assumptions:
- It happens infrequently that zoned write requests are reordered by the
  block layer.
- The I/O priority of all write requests is the same per zone.
- Either no I/O scheduler is used or an I/O scheduler is used that
  submits write requests per zone in LBA order.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 02a916ba62ee..9a64577fe942 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -338,6 +338,17 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
 	return rq;
 }
 
+/*
+ * Use write locking if either QUEUE_FLAG_NO_ZONE_WRITE_LOCK or
+ * REQ_NO_ZONE_WRITE_LOCK has not been set. Not using zone write locking is
+ * only safe if the submitter allocates and submit requests in LBA order per
+ * zone and if the block driver preserves the request order.
+ */
+static bool dd_use_write_locking(struct request *rq)
+{
+	return blk_queue_is_zoned(rq->q) && !blk_no_zone_write_lock(rq);
+}
+
 /*
  * For the specified data direction, return the next request to
  * dispatch using arrival ordered lists.
@@ -353,7 +364,7 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 		return NULL;
 
 	rq = rq_entry_fifo(per_prio->fifo_list[data_dir].next);
-	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
+	if (data_dir == DD_READ || !dd_use_write_locking(rq))
 		return rq;
 
 	/*
@@ -398,7 +409,7 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 	if (!rq)
 		return NULL;
 
-	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
+	if (data_dir == DD_READ || !dd_use_write_locking(rq))
 		return rq;
 
 	/*
@@ -526,8 +537,9 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
 	}
 
 	/*
-	 * For a zoned block device, if we only have writes queued and none of
-	 * them can be dispatched, rq will be NULL.
+	 * For a zoned block device that requires write serialization, if we
+	 * only have writes queued and none of them can be dispatched, rq will
+	 * be NULL.
 	 */
 	if (!rq)
 		return NULL;
@@ -552,7 +564,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
 	/*
 	 * If the request needs its target zone locked, do it.
 	 */
-	blk_req_zone_write_lock(rq);
+	if (dd_use_write_locking(rq))
+		blk_req_zone_write_lock(rq);
 	rq->rq_flags |= RQF_STARTED;
 	return rq;
 }
@@ -933,7 +946,7 @@ static void dd_finish_request(struct request *rq)
 
 	atomic_inc(&per_prio->stats.completed);
 
-	if (blk_queue_is_zoned(q)) {
+	if (dd_use_write_locking(rq)) {
 		unsigned long flags;
 
 		spin_lock_irqsave(&dd->zone_lock, flags);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 4/7] block/null_blk: Support disabling zone write locking
  2023-07-26 19:34 [PATCH v4 0/7] Improve the performance of F2FS on zoned UFS Bart Van Assche
                   ` (2 preceding siblings ...)
  2023-07-26 19:34 ` [PATCH v4 3/7] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
@ 2023-07-26 19:34 ` Bart Van Assche
  2023-07-27  0:04   ` Damien Le Moal
                     ` (2 more replies)
  2023-07-26 19:34 ` [PATCH v4 5/7] scsi: Retry unaligned zoned writes Bart Van Assche
                   ` (2 subsequent siblings)
  6 siblings, 3 replies; 27+ messages in thread
From: Bart Van Assche @ 2023-07-26 19:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Damien Le Moal, Bart Van Assche, Ming Lei, Chaitanya Kulkarni,
	Johannes Thumshirn, Vincent Fu, Keith Busch, Akinobu Mita,
	Shin'ichiro Kawasaki

Add a new configfs attribute for disabling zone write locking. Tests
show a performance of 250 K IOPS with no I/O scheduler, 6 K IOPS with
mq-deadline and write locking enabled and 123 K IOPS with mq-deadline
and write locking disabled. This shows that disabling write locking
results in about 20 times more IOPS for this particular test case.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/block/null_blk/main.c     | 2 ++
 drivers/block/null_blk/null_blk.h | 1 +
 drivers/block/null_blk/zoned.c    | 3 +++
 3 files changed, 6 insertions(+)

diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 864013019d6b..5c0578137f51 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -424,6 +424,7 @@ NULLB_DEVICE_ATTR(zone_capacity, ulong, NULL);
 NULLB_DEVICE_ATTR(zone_nr_conv, uint, NULL);
 NULLB_DEVICE_ATTR(zone_max_open, uint, NULL);
 NULLB_DEVICE_ATTR(zone_max_active, uint, NULL);
+NULLB_DEVICE_ATTR(no_zone_write_lock, bool, NULL);
 NULLB_DEVICE_ATTR(virt_boundary, bool, NULL);
 NULLB_DEVICE_ATTR(no_sched, bool, NULL);
 NULLB_DEVICE_ATTR(shared_tag_bitmap, bool, NULL);
@@ -569,6 +570,7 @@ static struct configfs_attribute *nullb_device_attrs[] = {
 	&nullb_device_attr_zone_max_active,
 	&nullb_device_attr_zone_readonly,
 	&nullb_device_attr_zone_offline,
+	&nullb_device_attr_no_zone_write_lock,
 	&nullb_device_attr_virt_boundary,
 	&nullb_device_attr_no_sched,
 	&nullb_device_attr_shared_tag_bitmap,
diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
index 929f659dd255..b521096bcc3f 100644
--- a/drivers/block/null_blk/null_blk.h
+++ b/drivers/block/null_blk/null_blk.h
@@ -117,6 +117,7 @@ struct nullb_device {
 	bool memory_backed; /* if data is stored in memory */
 	bool discard; /* if support discard */
 	bool zoned; /* if device is zoned */
+	bool no_zone_write_lock;
 	bool virt_boundary; /* virtual boundary on/off for the device */
 	bool no_sched; /* no IO scheduler for the device */
 	bool shared_tag_bitmap; /* use hostwide shared tags */
diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
index 55c5b48bc276..31c8364a63e9 100644
--- a/drivers/block/null_blk/zoned.c
+++ b/drivers/block/null_blk/zoned.c
@@ -96,6 +96,9 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
 
 	spin_lock_init(&dev->zone_res_lock);
 
+	if (dev->no_zone_write_lock)
+		blk_queue_flag_set(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
+
 	if (dev->zone_nr_conv >= dev->nr_zones) {
 		dev->zone_nr_conv = dev->nr_zones - 1;
 		pr_info("changed the number of conventional zones to %u",

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 5/7] scsi: Retry unaligned zoned writes
  2023-07-26 19:34 [PATCH v4 0/7] Improve the performance of F2FS on zoned UFS Bart Van Assche
                   ` (3 preceding siblings ...)
  2023-07-26 19:34 ` [PATCH v4 4/7] block/null_blk: Support disabling zone write locking Bart Van Assche
@ 2023-07-26 19:34 ` Bart Van Assche
  2023-07-27  0:33   ` Damien Le Moal
  2023-07-26 19:34 ` [PATCH v4 6/7] scsi: ufs: Disable zone write locking Bart Van Assche
  2023-07-26 19:34 ` [PATCH v4 7/7] fs/f2fs: " Bart Van Assche
  6 siblings, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2023-07-26 19:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Damien Le Moal, Bart Van Assche, Ming Lei, James E.J. Bottomley

If zoned writes (REQ_OP_WRITE) for a sequential write required zone have
a starting LBA that differs from the write pointer, e.g. because zoned
writes have been reordered, then the storage device will respond with an
UNALIGNED WRITE COMMAND error. Send commands that failed with an
unaligned write error to the SCSI error handler. Let the SCSI error
handler sort SCSI commands per LBA before resubmitting these.

If zone write locking is disabled, increase the number of retries for
write commands sent to a sequential zone to the maximum number of
outstanding commands because in the worst case the number of times
reordered zoned writes have to be retried is (number of outstanding
writes per sequential zone) - 1.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_error.c | 38 ++++++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c   |  1 +
 drivers/scsi/sd.c         |  2 ++
 include/scsi/scsi.h       |  1 +
 4 files changed, 42 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c67cdcdc3ba8..9dc354a24d4b 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -27,6 +27,7 @@
 #include <linux/blkdev.h>
 #include <linux/delay.h>
 #include <linux/jiffies.h>
+#include <linux/list_sort.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -698,6 +699,17 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
 		fallthrough;
 
 	case ILLEGAL_REQUEST:
+		/*
+		 * Unaligned write command. This indicates that zoned writes
+		 * have been received by the device in the wrong order. If zone
+		 * write locking is disabled, retry after all pending commands
+		 * have completed.
+		 */
+		if (sshdr.asc == 0x21 && sshdr.ascq == 0x04 &&
+		    blk_no_zone_write_lock(scsi_cmd_to_rq(scmd)) &&
+		    !scsi_noretry_cmd(scmd) && scsi_cmd_retry_allowed(scmd))
+			return NEEDS_DELAYED_RETRY;
+
 		if (sshdr.asc == 0x20 || /* Invalid command operation code */
 		    sshdr.asc == 0x21 || /* Logical block address out of range */
 		    sshdr.asc == 0x22 || /* Invalid function */
@@ -2223,6 +2235,25 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
 }
 EXPORT_SYMBOL(scsi_eh_flush_done_q);
 
+/*
+ * Returns a negative value if @_a has a lower LBA than @_b, zero if
+ * both have the same LBA and a positive value otherwise.
+ */
+static int scsi_cmp_lba(void *priv, const struct list_head *_a,
+			const struct list_head *_b)
+{
+	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
+	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
+	const sector_t pos_a = blk_rq_pos(scsi_cmd_to_rq(a));
+	const sector_t pos_b = blk_rq_pos(scsi_cmd_to_rq(b));
+
+	if (pos_a < pos_b)
+		return -1;
+	if (pos_a > pos_b)
+		return 1;
+	return 0;
+}
+
 /**
  * scsi_unjam_host - Attempt to fix a host which has a cmd that failed.
  * @shost:	Host to unjam.
@@ -2258,6 +2289,13 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
 
 	SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
 
+	/*
+	 * Sort pending SCSI commands in LBA order. This is important if one of
+	 * the SCSI devices associated with @shost is a zoned block device for
+	 * which zone write locking is disabled.
+	 */
+	list_sort(NULL, &eh_work_q, scsi_cmp_lba);
+
 	if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q))
 		scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 59176946ab56..69da8aee13df 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1443,6 +1443,7 @@ static void scsi_complete(struct request *rq)
 	case ADD_TO_MLQUEUE:
 		scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
 		break;
+	case NEEDS_DELAYED_RETRY:
 	default:
 		scsi_eh_scmd_add(cmd);
 		break;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 68b12afa0721..7b5877323245 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1235,6 +1235,8 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
 	cmd->transfersize = sdp->sector_size;
 	cmd->underflow = nr_blocks << 9;
 	cmd->allowed = sdkp->max_retries;
+	if (blk_no_zone_write_lock(rq) && blk_rq_is_seq_zoned_write(rq))
+		cmd->allowed += rq->q->nr_requests;
 	cmd->sdb.length = nr_blocks * sdp->sector_size;
 
 	SCSI_LOG_HLQUEUE(1,
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index ec093594ba53..6600db046227 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -93,6 +93,7 @@ static inline int scsi_status_is_check_condition(int status)
  * Internal return values.
  */
 enum scsi_disposition {
+	NEEDS_DELAYED_RETRY	= 0x2000,
 	NEEDS_RETRY		= 0x2001,
 	SUCCESS			= 0x2002,
 	FAILED			= 0x2003,

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 6/7] scsi: ufs: Disable zone write locking
  2023-07-26 19:34 [PATCH v4 0/7] Improve the performance of F2FS on zoned UFS Bart Van Assche
                   ` (4 preceding siblings ...)
  2023-07-26 19:34 ` [PATCH v4 5/7] scsi: Retry unaligned zoned writes Bart Van Assche
@ 2023-07-26 19:34 ` Bart Van Assche
  2023-07-27 21:55   ` kernel test robot
  2023-07-26 19:34 ` [PATCH v4 7/7] fs/f2fs: " Bart Van Assche
  6 siblings, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2023-07-26 19:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Damien Le Moal, Bart Van Assche, Avri Altman, Ming Lei,
	James E.J. Bottomley, Stanley Chu, Can Guo, Asutosh Das,
	Bao D. Nguyen, Bean Huo, Arthur Simchaev

From the UFSHCI 4.0 specification, about the legacy (single queue) mode:
"The host controller always process transfer requests in-order according
to the order submitted to the list. In case of multiple commands with
single doorbell register ringing (batch mode), The dispatch order for
these transfer requests by host controller will base on their index in
the List. A transfer request with lower index value will be executed
before a transfer request with higher index value."

From the UFSHCI 4.0 specification, about the MCQ mode:
"Command Submission
1. Host SW writes an Entry to SQ
2. Host SW updates SQ doorbell tail pointer

Command Processing
3. After fetching the Entry, Host Controller updates SQ doorbell head
   pointer
4. Host controller sends COMMAND UPIU to UFS device"

In other words, for both legacy and MCQ mode, UFS controllers are
required to forward commands to the UFS device in the order these
commands have been received from the host.

Notes:
- For legacy mode this is only correct if the host submits one
  command at a time. The UFS driver does this.
- Also in legacy mode, the command order is not preserved if
  auto-hibernation is enabled in the UFS controller. Hence, enable
  zone write locking if auto-hibernation is enabled.

This patch improves performance as follows on my test setup:
- With the mq-deadline scheduler: 2.5x more IOPS for small writes.
- When not using an I/O scheduler compared to using mq-deadline with
  zone locking: 4x more IOPS for small writes.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 45 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 42 insertions(+), 3 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 129446775796..0f7f91e2cda9 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4337,29 +4337,67 @@ int ufshcd_uic_hibern8_exit(struct ufs_hba *hba)
 }
 EXPORT_SYMBOL_GPL(ufshcd_uic_hibern8_exit);
 
+void ufshcd_update_no_zone_write_lock(struct ufs_hba *hba,
+				      bool set_no_zone_write_lock)
+{
+	struct scsi_device *sdev;
+
+	shost_for_each_device(sdev, hba->host)
+		blk_freeze_queue_start(sdev->request_queue);
+	shost_for_each_device(sdev, hba->host) {
+		struct request_queue *q = sdev->request_queue;
+
+		blk_mq_freeze_queue_wait(q);
+		if (set_no_zone_write_lock)
+			blk_queue_flag_set(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
+		else
+			blk_queue_flag_clear(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
+		blk_mq_unfreeze_queue(q);
+	}
+}
+
 void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
 {
 	unsigned long flags;
-	bool update = false;
+	bool prev_state, new_state, update = false;
 
 	if (!ufshcd_is_auto_hibern8_supported(hba))
 		return;
 
 	spin_lock_irqsave(hba->host->host_lock, flags);
+	prev_state = ufshcd_is_auto_hibern8_enabled(hba);
 	if (hba->ahit != ahit) {
 		hba->ahit = ahit;
 		update = true;
 	}
+	new_state = ufshcd_is_auto_hibern8_enabled(hba);
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
-	if (update &&
-	    !pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
+	if (!update)
+		return;
+	if (!is_mcq_enabled(hba) && !prev_state && new_state) {
+		/*
+		 * Auto-hibernation will be enabled. Enable write locking for
+		 * zoned writes since auto-hibernation may cause reordering of
+		 * zoned writes when using the legacy mode of the UFS host
+		 * controller.
+		 */
+		ufshcd_update_no_zone_write_lock(hba, false);
+	}
+	if (!pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
 		ufshcd_rpm_get_sync(hba);
 		ufshcd_hold(hba);
 		ufshcd_auto_hibern8_enable(hba);
 		ufshcd_release(hba);
 		ufshcd_rpm_put_sync(hba);
 	}
+	if (!is_mcq_enabled(hba) && prev_state && !new_state) {
+		/*
+		 * Auto-hibernation has been disabled. Disable write locking
+		 * for zoned writes.
+		 */
+		ufshcd_update_no_zone_write_lock(hba, true);
+	}
 }
 EXPORT_SYMBOL_GPL(ufshcd_auto_hibern8_update);
 
@@ -5139,6 +5177,7 @@ static int ufshcd_slave_configure(struct scsi_device *sdev)
 
 	ufshcd_hpb_configure(hba, sdev);
 
+	blk_queue_flag_set(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
 	blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1);
 	if (hba->quirks & UFSHCD_QUIRK_4KB_DMA_ALIGNMENT)
 		blk_queue_update_dma_alignment(q, SZ_4K - 1);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 7/7] fs/f2fs: Disable zone write locking
  2023-07-26 19:34 [PATCH v4 0/7] Improve the performance of F2FS on zoned UFS Bart Van Assche
                   ` (5 preceding siblings ...)
  2023-07-26 19:34 ` [PATCH v4 6/7] scsi: ufs: Disable zone write locking Bart Van Assche
@ 2023-07-26 19:34 ` Bart Van Assche
  6 siblings, 0 replies; 27+ messages in thread
From: Bart Van Assche @ 2023-07-26 19:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Damien Le Moal, Bart Van Assche, Ming Lei, Chao Yu

Set the REQ_NO_ZONE_WRITE_LOCK flag to inform the block layer that F2FS
allocates and submits zoned writes in LBA order per zone.

Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 fs/f2fs/data.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5882afe71d82..87ef089f1e88 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -468,8 +468,8 @@ static struct bio *__bio_alloc(struct f2fs_io_info *fio, int npages)
 	struct bio *bio;
 
 	bdev = f2fs_target_device(sbi, fio->new_blkaddr, &sector);
-	bio = bio_alloc_bioset(bdev, npages,
-				fio->op | fio->op_flags | f2fs_io_flags(fio),
+	bio = bio_alloc_bioset(bdev, npages, fio->op | fio->op_flags |
+				f2fs_io_flags(fio) | REQ_NO_ZONE_WRITE_LOCK,
 				GFP_NOIO, &f2fs_bioset);
 	bio->bi_iter.bi_sector = sector;
 	if (is_read_io(fio->op)) {

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK
  2023-07-26 19:34 ` [PATCH v4 1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK Bart Van Assche
@ 2023-07-26 23:49   ` Damien Le Moal
  2023-07-27  0:54   ` Chaitanya Kulkarni
  2023-07-27 10:59   ` Nitesh Shetty
  2 siblings, 0 replies; 27+ messages in thread
From: Damien Le Moal @ 2023-07-26 23:49 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Ming Lei

On 7/27/23 04:34, Bart Van Assche wrote:
> Writes in sequential write required zones must happen at the write
> pointer. Even if the submitter of the write commands (e.g. a filesystem)
> submits writes for sequential write required zones in order, the block
> layer or the storage controller may reorder these write commands.
> 
> The zone locking mechanism in the mq-deadline I/O scheduler serializes
> write commands for sequential zones. Some but not all storage controllers
> require this serialization. Introduce a new flag such that block drivers
> can request that zone write locking is disabled.

For the last sentence:

Introduce the new flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK to allow block device
drivers to indicate that they can preserve wrtie command ordering and thus do
not require zone write locking to be used.

Is in my opinion a lot clearer.

> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  include/linux/blkdev.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 2f5371b8482c..066ac395f62f 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -534,6 +534,11 @@ struct request_queue {
>  #define QUEUE_FLAG_NONROT	6	/* non-rotational device (SSD) */
>  #define QUEUE_FLAG_VIRT		QUEUE_FLAG_NONROT /* paravirt device */
>  #define QUEUE_FLAG_IO_STAT	7	/* do disk/partitions IO accounting */
> +/*
> + * Do not use the zone write lock for sequential writes for sequential write
> + * required zones.
> + */
> +#define QUEUE_FLAG_NO_ZONE_WRITE_LOCK 8
>  #define QUEUE_FLAG_NOXMERGES	9	/* No extended merges */
>  #define QUEUE_FLAG_ADD_RANDOM	10	/* Contributes to random pool */
>  #define QUEUE_FLAG_SYNCHRONOUS	11	/* always completes in submit context */
> @@ -597,6 +602,11 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
>  #define blk_queue_skip_tagset_quiesce(q) \
>  	test_bit(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, &(q)->queue_flags)
>  
> +static inline bool blk_queue_no_zone_write_lock(struct request_queue *q)
> +{
> +	return test_bit(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, &q->queue_flags);
> +}
> +
>  extern void blk_set_pm_only(struct request_queue *q);
>  extern void blk_clear_pm_only(struct request_queue *q);

Looks good, but I really think this should be squashed with patch 2. More on
this in reply to that patch.

>  

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 2/7] block: Introduce the flag REQ_NO_ZONE_WRITE_LOCK
  2023-07-26 19:34 ` [PATCH v4 2/7] block: Introduce the flag REQ_NO_ZONE_WRITE_LOCK Bart Van Assche
@ 2023-07-27  0:03   ` Damien Le Moal
  0 siblings, 0 replies; 27+ messages in thread
From: Damien Le Moal @ 2023-07-27  0:03 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Ming Lei

On 7/27/23 04:34, Bart Van Assche wrote:
> Not all software that supports zoned storage allocates and submits zoned
> writes in LBA order per zone. Introduce the REQ_NO_WRITE_LOCK flag such
> that submitters of zoned writes can indicate that zoned writes are
> allocated and submitted in LBA order per zone.

This really needs more explanation... "Not all software that supports zoned
storage allocates and submits zoned writes in LBA order per zone" is very
misleading. I already commented on this on the last round. All writers to zones
are suppose to be writing sequentially. But while we can trust in-kernel users
to do that correctly, we cannot trust applications for the same. Hence this flag.

> Introduce the blk_no_zone_write_lock() function to make it easy to test
> whether both QUEUE_FLAG_NO_ZONE_WRITE_LOCK and REQ_NO_ZONE_WRITE_LOCK
> are set.

As mentioned in patch 1, I think it would be a lot better to squash this patch
with the previous one since the queue flag and the req flag work together are
are useless alone.

> 
> Make flush requests inherit the REQ_NO_ZONE_WRITE_LOCK flag from the
> request they are derived from.

This really needs to be moved to its own patch. But I still fail to understand
why flush needs to be touched. If flush commands are being subjected to zone
write locking, then that's a bug I think. Or I am missing something ?

> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-flush.c         |  3 ++-
>  include/linux/blk-mq.h    | 11 +++++++++++
>  include/linux/blk_types.h |  8 ++++++++
>  3 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index e73dc22d05c1..038350ed7cce 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -336,7 +336,8 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
>  		flush_rq->internal_tag = first_rq->internal_tag;
>  
>  	flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
> -	flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK);
> +	flush_rq->cmd_flags |= flags &
> +		(REQ_FAILFAST_MASK | REQ_NO_ZONE_WRITE_LOCK | REQ_DRV);

See above. Let's move this to another patch to better explain why this change
is needed.

>  	flush_rq->rq_flags |= RQF_FLUSH_SEQ;
>  	flush_rq->end_io = flush_end_io;
>  	/*
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 01e8c31db665..dc8ec26ba2d0 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -1170,6 +1170,12 @@ static inline bool blk_rq_is_seq_zoned_write(struct request *rq)
>  		blk_rq_zone_is_seq(rq);
>  }
>  
> +static inline bool blk_no_zone_write_lock(struct request *rq)
> +{
> +	return blk_queue_no_zone_write_lock(rq->q) &&
> +		rq->cmd_flags & REQ_NO_ZONE_WRITE_LOCK;
> +}
> +
>  bool blk_req_needs_zone_write_lock(struct request *rq);
>  bool blk_req_zone_write_trylock(struct request *rq);
>  void __blk_req_zone_write_lock(struct request *rq);
> @@ -1205,6 +1211,11 @@ static inline bool blk_rq_is_seq_zoned_write(struct request *rq)
>  	return false;
>  }
>  
> +static inline bool blk_no_zone_write_lock(struct request *rq)
> +{
> +	return true;
> +}
> +
>  static inline bool blk_req_needs_zone_write_lock(struct request *rq)
>  {
>  	return false;
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 0bad62cca3d0..68f7934d8fa6 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -416,6 +416,12 @@ enum req_flag_bits {
>  	__REQ_PREFLUSH,		/* request for cache flush */
>  	__REQ_RAHEAD,		/* read ahead, can fail anytime */
>  	__REQ_BACKGROUND,	/* background IO */
> +	/*
> +	 * Do not use zone write locking. Setting this flag is only safe if
> +	 * the request submitter allocates and submit requests in LBA order per
> +	 * zone.

"submit requests" -> "submits write requests"
LBA -> sector (as requests use sector unit, not LBA unit)

> +	 */
> +	__REQ_NO_ZONE_WRITE_LOCK,
>  	__REQ_NOWAIT,           /* Don't wait if request will block */
>  	__REQ_POLLED,		/* caller polls for completion using bio_poll */
>  	__REQ_ALLOC_CACHE,	/* allocate IO from cache if available */
> @@ -448,6 +454,8 @@ enum req_flag_bits {
>  #define REQ_PREFLUSH	(__force blk_opf_t)(1ULL << __REQ_PREFLUSH)
>  #define REQ_RAHEAD	(__force blk_opf_t)(1ULL << __REQ_RAHEAD)
>  #define REQ_BACKGROUND	(__force blk_opf_t)(1ULL << __REQ_BACKGROUND)
> +#define REQ_NO_ZONE_WRITE_LOCK	\
> +			(__force blk_opf_t)(1ULL << __REQ_NO_ZONE_WRITE_LOCK)
>  #define REQ_NOWAIT	(__force blk_opf_t)(1ULL << __REQ_NOWAIT)
>  #define REQ_POLLED	(__force blk_opf_t)(1ULL << __REQ_POLLED)
>  #define REQ_ALLOC_CACHE	(__force blk_opf_t)(1ULL << __REQ_ALLOC_CACHE)

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 4/7] block/null_blk: Support disabling zone write locking
  2023-07-26 19:34 ` [PATCH v4 4/7] block/null_blk: Support disabling zone write locking Bart Van Assche
@ 2023-07-27  0:04   ` Damien Le Moal
  2023-07-27  0:36   ` Chaitanya Kulkarni
  2023-07-27 19:02   ` Nitesh Shetty
  2 siblings, 0 replies; 27+ messages in thread
From: Damien Le Moal @ 2023-07-27  0:04 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Ming Lei, Chaitanya Kulkarni, Johannes Thumshirn, Vincent Fu,
	Keith Busch, Akinobu Mita, Shin'ichiro Kawasaki

On 7/27/23 04:34, Bart Van Assche wrote:
> Add a new configfs attribute for disabling zone write locking. Tests
> show a performance of 250 K IOPS with no I/O scheduler, 6 K IOPS with
> mq-deadline and write locking enabled and 123 K IOPS with mq-deadline
> and write locking disabled. This shows that disabling write locking
> results in about 20 times more IOPS for this particular test case.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 3/7] block/mq-deadline: Only use zone locking if necessary
  2023-07-26 19:34 ` [PATCH v4 3/7] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
@ 2023-07-27  0:09   ` Damien Le Moal
  0 siblings, 0 replies; 27+ messages in thread
From: Damien Le Moal @ 2023-07-27  0:09 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Ming Lei

On 7/27/23 04:34, Bart Van Assche wrote:
> Measurements have shown that limiting the queue depth to one for zoned

to one -> to one per zone

> writes has a significant negative performance impact on zoned UFS devices.
> Hence this patch that disables zone locking by the mq-deadline scheduler
> if zoned writes are submitted in order and if the storage controller

The "submitted in order" is actually never checked. So let's not use this in
this explanation.

> preserves the command order. This patch is based on the following
> assumptions:
> - It happens infrequently that zoned write requests are reordered by the
>   block layer.
> - The I/O priority of all write requests is the same per zone.
> - Either no I/O scheduler is used or an I/O scheduler is used that
>   submits write requests per zone in LBA order.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/mq-deadline.c | 25 +++++++++++++++++++------
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 02a916ba62ee..9a64577fe942 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -338,6 +338,17 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd,
>  	return rq;
>  }
>  
> +/*
> + * Use write locking if either QUEUE_FLAG_NO_ZONE_WRITE_LOCK or
> + * REQ_NO_ZONE_WRITE_LOCK has not been set. Not using zone write locking is
> + * only safe if the submitter allocates and submit requests in LBA order per
> + * zone and if the block driver preserves the request order.
> + */
> +static bool dd_use_write_locking(struct request *rq)

Nit: maybe rename this to dd_use_zone_write_locking() to be clear this is for
zoned devices only ?

> +{
> +	return blk_queue_is_zoned(rq->q) && !blk_no_zone_write_lock(rq);
> +}
> +
>  /*
>   * For the specified data direction, return the next request to
>   * dispatch using arrival ordered lists.
> @@ -353,7 +364,7 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>  		return NULL;
>  
>  	rq = rq_entry_fifo(per_prio->fifo_list[data_dir].next);
> -	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
> +	if (data_dir == DD_READ || !dd_use_write_locking(rq))
>  		return rq;
>  
>  	/*
> @@ -398,7 +409,7 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>  	if (!rq)
>  		return NULL;
>  
> -	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
> +	if (data_dir == DD_READ || !dd_use_write_locking(rq))
>  		return rq;
>  
>  	/*
> @@ -526,8 +537,9 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
>  	}
>  
>  	/*
> -	 * For a zoned block device, if we only have writes queued and none of
> -	 * them can be dispatched, rq will be NULL.
> +	 * For a zoned block device that requires write serialization, if we
> +	 * only have writes queued and none of them can be dispatched, rq will
> +	 * be NULL.
>  	 */
>  	if (!rq)
>  		return NULL;
> @@ -552,7 +564,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
>  	/*
>  	 * If the request needs its target zone locked, do it.
>  	 */
> -	blk_req_zone_write_lock(rq);
> +	if (dd_use_write_locking(rq))
> +		blk_req_zone_write_lock(rq);
>  	rq->rq_flags |= RQF_STARTED;
>  	return rq;
>  }
> @@ -933,7 +946,7 @@ static void dd_finish_request(struct request *rq)
>  
>  	atomic_inc(&per_prio->stats.completed);
>  
> -	if (blk_queue_is_zoned(q)) {
> +	if (dd_use_write_locking(rq)) {
>  		unsigned long flags;
>  
>  		spin_lock_irqsave(&dd->zone_lock, flags);

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 5/7] scsi: Retry unaligned zoned writes
  2023-07-26 19:34 ` [PATCH v4 5/7] scsi: Retry unaligned zoned writes Bart Van Assche
@ 2023-07-27  0:33   ` Damien Le Moal
  2023-07-27  1:03     ` Bart Van Assche
  2023-07-27 20:57     ` Bart Van Assche
  0 siblings, 2 replies; 27+ messages in thread
From: Damien Le Moal @ 2023-07-27  0:33 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Ming Lei, James E.J. Bottomley

On 7/27/23 04:34, Bart Van Assche wrote:
> If zoned writes (REQ_OP_WRITE) for a sequential write required zone have
> a starting LBA that differs from the write pointer, e.g. because zoned
> writes have been reordered, then the storage device will respond with an
> UNALIGNED WRITE COMMAND error. Send commands that failed with an
> unaligned write error to the SCSI error handler. Let the SCSI error
> handler sort SCSI commands per LBA before resubmitting these.

This last sentence reads as if the retry is done for all cases. That is not
true. It is done only and only if zone write locking is disabled, that is for
requests with REQ_NO_ZONE_WRITE_LOCK and a request queue with
QUEUE_FLAG_NO_ZONE_WRITE_LOCK set.

> If zone write locking is disabled, increase the number of retries for
> write commands sent to a sequential zone to the maximum number of
> outstanding commands because in the worst case the number of times
> reordered zoned writes have to be retried is (number of outstanding
> writes per sequential zone) - 1.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/scsi_error.c | 38 ++++++++++++++++++++++++++++++++++++++
>  drivers/scsi/scsi_lib.c   |  1 +
>  drivers/scsi/sd.c         |  2 ++
>  include/scsi/scsi.h       |  1 +
>  4 files changed, 42 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index c67cdcdc3ba8..9dc354a24d4b 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -27,6 +27,7 @@
>  #include <linux/blkdev.h>
>  #include <linux/delay.h>
>  #include <linux/jiffies.h>
> +#include <linux/list_sort.h>
>  
>  #include <scsi/scsi.h>
>  #include <scsi/scsi_cmnd.h>
> @@ -698,6 +699,17 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>  		fallthrough;
>  
>  	case ILLEGAL_REQUEST:
> +		/*
> +		 * Unaligned write command. This indicates that zoned writes
> +		 * have been received by the device in the wrong order. If zone
> +		 * write locking is disabled, retry after all pending commands
> +		 * have completed.
> +		 */
> +		if (sshdr.asc == 0x21 && sshdr.ascq == 0x04 &&
> +		    blk_no_zone_write_lock(scsi_cmd_to_rq(scmd)) &&
> +		    !scsi_noretry_cmd(scmd) && scsi_cmd_retry_allowed(scmd))
> +			return NEEDS_DELAYED_RETRY;
> +
>  		if (sshdr.asc == 0x20 || /* Invalid command operation code */
>  		    sshdr.asc == 0x21 || /* Logical block address out of range */
>  		    sshdr.asc == 0x22 || /* Invalid function */
> @@ -2223,6 +2235,25 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
>  }
>  EXPORT_SYMBOL(scsi_eh_flush_done_q);
>  
> +/*
> + * Returns a negative value if @_a has a lower LBA than @_b, zero if
> + * both have the same LBA and a positive value otherwise.
> + */
> +static int scsi_cmp_lba(void *priv, const struct list_head *_a,
> +			const struct list_head *_b)
> +{
> +	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
> +	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
> +	const sector_t pos_a = blk_rq_pos(scsi_cmd_to_rq(a));
> +	const sector_t pos_b = blk_rq_pos(scsi_cmd_to_rq(b));
> +
> +	if (pos_a < pos_b)
> +		return -1;
> +	if (pos_a > pos_b)
> +		return 1;
> +	return 0;
> +}
> +
>  /**
>   * scsi_unjam_host - Attempt to fix a host which has a cmd that failed.
>   * @shost:	Host to unjam.
> @@ -2258,6 +2289,13 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
>  
>  	SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
>  
> +	/*
> +	 * Sort pending SCSI commands in LBA order. This is important if one of
> +	 * the SCSI devices associated with @shost is a zoned block device for
> +	 * which zone write locking is disabled.
> +	 */
> +	list_sort(NULL, &eh_work_q, scsi_cmp_lba);
> +
>  	if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q))
>  		scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
>  
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 59176946ab56..69da8aee13df 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1443,6 +1443,7 @@ static void scsi_complete(struct request *rq)
>  	case ADD_TO_MLQUEUE:
>  		scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
>  		break;
> +	case NEEDS_DELAYED_RETRY:
>  	default:
>  		scsi_eh_scmd_add(cmd);
>  		break;
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 68b12afa0721..7b5877323245 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1235,6 +1235,8 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
>  	cmd->transfersize = sdp->sector_size;
>  	cmd->underflow = nr_blocks << 9;
>  	cmd->allowed = sdkp->max_retries;
> +	if (blk_no_zone_write_lock(rq) && blk_rq_is_seq_zoned_write(rq))
> +		cmd->allowed += rq->q->nr_requests;

Why += ? Shouldn't this be "=" ?

That said, reading and commenting on your patches gave me a better
understanding of how this works. With that, I am now thinking that
QUEUE_FLAG_NO_ZONE_WRITE_LOCK may be the only thing needed.
My thinking:
1) For well behaved applications correctly issuing writes sequentially per zone
from the write pointer, we will not get any performance benefits for device
with QUEUE_FLAG_NO_ZONE_WRITE_LOCK. But we could if REQ_NO_ZONE_WRITE_LOCK was set.
2) For buggy applications that fail to write zones sequentially, they already
today get an error. That will remain true even for devices with
QUEUE_FLAG_NO_ZONE_WRITE_LOCK and even if we set REQ_NO_ZONE_WRITE_LOCK. The
only difference from today will be that the error return to the application
will be delayed by all the retries, but an error will still be returned,
always, even in the case of an application completely forgetting to issue one
IO (i.e. the application submitted a write stream with a hole in it). There
will be no deadlock given that the retry is within the scsi layer only, not
back to the block layer (which could create deadlocks preventing forward progress).
3) For in-kernel users (f2fs, btrfs, zonefs, dm-zoned), we know writes happen
sequentially, even when stacked on top of zone compliant DMs (dm-crypt,
dm-linear). They all could set REQ_NO_ZONE_WRITE_LOCK for devices with
QUEUE_FLAG_NO_ZONE_WRITE_LOCK and get the same performance boost as you see for
f2fs.

From all this, and given that for (3) REQ_NO_ZONE_WRITE_LOCK is set
unconditionally, it now seems to me that this request flag is useless...
Thoughts ?

>  	cmd->sdb.length = nr_blocks * sdp->sector_size;
>  
>  	SCSI_LOG_HLQUEUE(1,
> diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
> index ec093594ba53..6600db046227 100644
> --- a/include/scsi/scsi.h
> +++ b/include/scsi/scsi.h
> @@ -93,6 +93,7 @@ static inline int scsi_status_is_check_condition(int status)
>   * Internal return values.
>   */
>  enum scsi_disposition {
> +	NEEDS_DELAYED_RETRY	= 0x2000,
>  	NEEDS_RETRY		= 0x2001,
>  	SUCCESS			= 0x2002,
>  	FAILED			= 0x2003,

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 4/7] block/null_blk: Support disabling zone write locking
  2023-07-26 19:34 ` [PATCH v4 4/7] block/null_blk: Support disabling zone write locking Bart Van Assche
  2023-07-27  0:04   ` Damien Le Moal
@ 2023-07-27  0:36   ` Chaitanya Kulkarni
  2023-07-27 20:26     ` Bart Van Assche
  2023-07-27 19:02   ` Nitesh Shetty
  2 siblings, 1 reply; 27+ messages in thread
From: Chaitanya Kulkarni @ 2023-07-27  0:36 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Damien Le Moal, Ming Lei, Chaitanya Kulkarni, Jens Axboe,
	Johannes Thumshirn, Vincent Fu, Keith Busch, Akinobu Mita,
	Shin'ichiro Kawasaki

On 7/26/2023 12:34 PM, Bart Van Assche wrote:
> Add a new configfs attribute for disabling zone write locking. Tests
> show a performance of 250 K IOPS with no I/O scheduler, 6 K IOPS with
> mq-deadline and write locking enabled and 123 K IOPS with mq-deadline
> and write locking disabled. This shows that disabling write locking
> results in about 20 times more IOPS for this particular test case.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---

This patch looks good to me, perhaps a blktests would be nice in zbd
category ?

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK
  2023-07-26 19:34 ` [PATCH v4 1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK Bart Van Assche
  2023-07-26 23:49   ` Damien Le Moal
@ 2023-07-27  0:54   ` Chaitanya Kulkarni
  2023-07-27 20:20     ` Bart Van Assche
  2023-07-27 10:59   ` Nitesh Shetty
  2 siblings, 1 reply; 27+ messages in thread
From: Chaitanya Kulkarni @ 2023-07-27  0:54 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Damien Le Moal, Ming Lei, Jens Axboe

On 7/26/2023 12:34 PM, Bart Van Assche wrote:
> Writes in sequential write required zones must happen at the write
> pointer. Even if the submitter of the write commands (e.g. a filesystem)
> submits writes for sequential write required zones in order, the block
> layer or the storage controller may reorder these write commands.
> 
> The zone locking mechanism in the mq-deadline I/O scheduler serializes
> write commands for sequential zones. Some but not all storage controllers
> require this serialization. Introduce a new flag such that block drivers
> can request that zone write locking is disabled.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   include/linux/blkdev.h | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 2f5371b8482c..066ac395f62f 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -534,6 +534,11 @@ struct request_queue {
>   #define QUEUE_FLAG_NONROT	6	/* non-rotational device (SSD) */
>   #define QUEUE_FLAG_VIRT		QUEUE_FLAG_NONROT /* paravirt device */
>   #define QUEUE_FLAG_IO_STAT	7	/* do disk/partitions IO accounting */
> +/*
> + * Do not use the zone write lock for sequential writes for sequential write
> + * required zones.
> + */

In first go I got little confused with above comment, how about :-

/*
  * When issuing sequential writes on zone type
  * BLK_ZONE_TYPE_SEQWRITE_REQ, don't use zone write locking.
  */

It makes it easier to search in code with BLK_ZONE_TYPE_SEQWRITE_REQ
but if everyone else is okay with original comment or if it is not
correct suggestion feel free to ignore my suggestion ...

> +#define QUEUE_FLAG_NO_ZONE_WRITE_LOCK 8
>   #define QUEUE_FLAG_NOXMERGES	9	/* No extended merges */
>   #define QUEUE_FLAG_ADD_RANDOM	10	/* Contributes to random pool */
>   #define QUEUE_FLAG_SYNCHRONOUS	11	/* always completes in submit context */
> @@ -597,6 +602,11 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
>   #define blk_queue_skip_tagset_quiesce(q) \
>   	test_bit(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, &(q)->queue_flags)
>   
> +static inline bool blk_queue_no_zone_write_lock(struct request_queue *q)
> +{
> +	return test_bit(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, &q->queue_flags);
> +}
> +

Is it true above helper is only called from blk_no_zone_write_lock() ?

if that is true open coding above call into blk_no_zone_write_lock()
makes it more readable to me while reviewing the series, but other are
okay then ignore my comment..

>   extern void blk_set_pm_only(struct request_queue *q);
>   extern void blk_clear_pm_only(struct request_queue *q);
>   

-ck



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 5/7] scsi: Retry unaligned zoned writes
  2023-07-27  0:33   ` Damien Le Moal
@ 2023-07-27  1:03     ` Bart Van Assche
  2023-07-27  1:09       ` Damien Le Moal
  2023-07-27 20:57     ` Bart Van Assche
  1 sibling, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2023-07-27  1:03 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Ming Lei, James E.J. Bottomley

On 7/26/23 17:33, Damien Le Moal wrote:
>  From all this, and given that for (3) REQ_NO_ZONE_WRITE_LOCK is set
> unconditionally, it now seems to me that this request flag is useless...
> Thoughts ?

Hi Damien,

Thanks for having taken a close look at this patch series.

The flag REQ_NO_ZONE_WRITE_LOCK was introduced based on earlier review
feedback. Removing that flag again would help me because that would
allow me to develop a test for the blktest suite that submits I/O
directly to the block device instead of an F2FS filesystem. I like
F2FS but it's probably good to minimize the number of layers when
writing blktest tests.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 5/7] scsi: Retry unaligned zoned writes
  2023-07-27  1:03     ` Bart Van Assche
@ 2023-07-27  1:09       ` Damien Le Moal
  2023-07-27 14:53         ` Bart Van Assche
  0 siblings, 1 reply; 27+ messages in thread
From: Damien Le Moal @ 2023-07-27  1:09 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Ming Lei, James E.J. Bottomley

On 7/27/23 10:03, Bart Van Assche wrote:
> On 7/26/23 17:33, Damien Le Moal wrote:
>>  From all this, and given that for (3) REQ_NO_ZONE_WRITE_LOCK is set
>> unconditionally, it now seems to me that this request flag is useless...
>> Thoughts ?
> 
> Hi Damien,
> 
> Thanks for having taken a close look at this patch series.
> 
> The flag REQ_NO_ZONE_WRITE_LOCK was introduced based on earlier review
> feedback. Removing that flag again would help me because that would
> allow me to develop a test for the blktest suite that submits I/O
> directly to the block device instead of an F2FS filesystem. I like
> F2FS but it's probably good to minimize the number of layers when
> writing blktest tests.

Sounds good.

If you could also test the series with zonefs, to hammer it some more that
would be good (I unfortunately do not have any zoned UFS devices to run that
myself). You can run zonefs test suite (see
https://github.com/westerndigitalcorporation/zonefs-tools/tree/master/tests).
Simply execute "zonefs-tests.sh /dev/sdX" as root and everything should run
(need zonefs-tools and fio installed).

> 
> Thanks,
> 
> Bart.
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK
  2023-07-26 19:34 ` [PATCH v4 1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK Bart Van Assche
  2023-07-26 23:49   ` Damien Le Moal
  2023-07-27  0:54   ` Chaitanya Kulkarni
@ 2023-07-27 10:59   ` Nitesh Shetty
  2023-07-27 14:43     ` Bart Van Assche
  2 siblings, 1 reply; 27+ messages in thread
From: Nitesh Shetty @ 2023-07-27 10:59 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	Jaegeuk Kim, Damien Le Moal, Ming Lei

On Thu, Jul 27, 2023 at 1:39 AM Bart Van Assche <bvanassche@acm.org> wrote:
> + * Do not use the zone write lock for sequential writes for sequential write
> + * required zones.
> + */
> +#define QUEUE_FLAG_NO_ZONE_WRITE_LOCK 8

Instead of QUEUE_FLAG_NO_ZONE_WRITE_LOCK I feel
QUEUE_FLAG_SKIP_ZONE_WRITE_LOCK is better.

--Nitesh Shetty

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK
  2023-07-27 10:59   ` Nitesh Shetty
@ 2023-07-27 14:43     ` Bart Van Assche
  2023-07-27 19:06       ` Nitesh Shetty
  0 siblings, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2023-07-27 14:43 UTC (permalink / raw)
  To: Nitesh Shetty
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	Jaegeuk Kim, Damien Le Moal, Ming Lei

On 7/27/23 03:59, Nitesh Shetty wrote:
> On Thu, Jul 27, 2023 at 1:39 AM Bart Van Assche <bvanassche@acm.org> wrote:
>> + * Do not use the zone write lock for sequential writes for sequential write
>> + * required zones.
>> + */
>> +#define QUEUE_FLAG_NO_ZONE_WRITE_LOCK 8
> 
> Instead of QUEUE_FLAG_NO_ZONE_WRITE_LOCK I feel
> QUEUE_FLAG_SKIP_ZONE_WRITE_LOCK is better.

Hmm ... this patch series makes it possible to use an UFS storage 
controller and a zoned UFS device with no I/O scheduler attached. If no 
I/O scheduler is active, there is no zone write locking to skip so I 
think that the former name is better.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 5/7] scsi: Retry unaligned zoned writes
  2023-07-27  1:09       ` Damien Le Moal
@ 2023-07-27 14:53         ` Bart Van Assche
  0 siblings, 0 replies; 27+ messages in thread
From: Bart Van Assche @ 2023-07-27 14:53 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Ming Lei, James E.J. Bottomley

On 7/26/23 18:09, Damien Le Moal wrote:
> If you could also test the series with zonefs, to hammer it some more that
> would be good (I unfortunately do not have any zoned UFS devices to run that
> myself). You can run zonefs test suite (see
> https://github.com/westerndigitalcorporation/zonefs-tools/tree/master/tests).
> Simply execute "zonefs-tests.sh /dev/sdX" as root and everything should run
> (need zonefs-tools and fio installed).

Running zonefs-tests.sh on an Android device might require a 
considerable amount of work. zonefs-tests.sh is a bash script. Bash is 
not available on Android devices. From 
https://android.googlesource.com/platform/system/core/+/master/shell_and_utilities/README.md:
----------------------------------------------------------------------
Since IceCreamSandwich Android has used mksh as its shell. Before then 
it used ash (which actually remained unused in the tree up to and 
including KitKat).

Initially Android had a very limited command-line provided by its own 
“toolbox” binary. Since Marshmallow almost everything is supplied by 
toybox instead.
----------------------------------------------------------------------

Another challenge is that zonefs-tools depends on at least one header 
file that is not included in the Android NDK (blkid/blkid.h).

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 4/7] block/null_blk: Support disabling zone write locking
  2023-07-26 19:34 ` [PATCH v4 4/7] block/null_blk: Support disabling zone write locking Bart Van Assche
  2023-07-27  0:04   ` Damien Le Moal
  2023-07-27  0:36   ` Chaitanya Kulkarni
@ 2023-07-27 19:02   ` Nitesh Shetty
  2 siblings, 0 replies; 27+ messages in thread
From: Nitesh Shetty @ 2023-07-27 19:02 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	Jaegeuk Kim, Damien Le Moal, Ming Lei, Chaitanya Kulkarni,
	Johannes Thumshirn, Vincent Fu, Keith Busch, Akinobu Mita,
	Shin'ichiro Kawasaki

On Thu, Jul 27, 2023 at 2:22 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> Add a new configfs attribute for disabling zone write locking. Tests
> show a performance of 250 K IOPS with no I/O scheduler, 6 K IOPS with
> mq-deadline and write locking enabled and 123 K IOPS with mq-deadline
> and write locking disabled. This shows that disabling write locking
> results in about 20 times more IOPS for this particular test case.
>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/block/null_blk/main.c     | 2 ++
>  drivers/block/null_blk/null_blk.h | 1 +
>  drivers/block/null_blk/zoned.c    | 3 +++
>  3 files changed, 6 insertions(+)
>
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index 864013019d6b..5c0578137f51 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -424,6 +424,7 @@ NULLB_DEVICE_ATTR(zone_capacity, ulong, NULL);
>  NULLB_DEVICE_ATTR(zone_nr_conv, uint, NULL);
>  NULLB_DEVICE_ATTR(zone_max_open, uint, NULL);
>  NULLB_DEVICE_ATTR(zone_max_active, uint, NULL);
> +NULLB_DEVICE_ATTR(no_zone_write_lock, bool, NULL);
>  NULLB_DEVICE_ATTR(virt_boundary, bool, NULL);
>  NULLB_DEVICE_ATTR(no_sched, bool, NULL);
>  NULLB_DEVICE_ATTR(shared_tag_bitmap, bool, NULL);
> @@ -569,6 +570,7 @@ static struct configfs_attribute *nullb_device_attrs[] = {
>         &nullb_device_attr_zone_max_active,
>         &nullb_device_attr_zone_readonly,
>         &nullb_device_attr_zone_offline,
> +       &nullb_device_attr_no_zone_write_lock,
>         &nullb_device_attr_virt_boundary,
>         &nullb_device_attr_no_sched,
>         &nullb_device_attr_shared_tag_bitmap,
> diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
> index 929f659dd255..b521096bcc3f 100644
> --- a/drivers/block/null_blk/null_blk.h
> +++ b/drivers/block/null_blk/null_blk.h
> @@ -117,6 +117,7 @@ struct nullb_device {
>         bool memory_backed; /* if data is stored in memory */
>         bool discard; /* if support discard */
>         bool zoned; /* if device is zoned */
> +       bool no_zone_write_lock;
>         bool virt_boundary; /* virtual boundary on/off for the device */
>         bool no_sched; /* no IO scheduler for the device */
>         bool shared_tag_bitmap; /* use hostwide shared tags */
> diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
> index 55c5b48bc276..31c8364a63e9 100644
> --- a/drivers/block/null_blk/zoned.c
> +++ b/drivers/block/null_blk/zoned.c
> @@ -96,6 +96,9 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
>
>         spin_lock_init(&dev->zone_res_lock);
>
> +       if (dev->no_zone_write_lock)
> +               blk_queue_flag_set(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
> +
>         if (dev->zone_nr_conv >= dev->nr_zones) {
>                 dev->zone_nr_conv = dev->nr_zones - 1;
>                 pr_info("changed the number of conventional zones to %u",

Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK
  2023-07-27 14:43     ` Bart Van Assche
@ 2023-07-27 19:06       ` Nitesh Shetty
  0 siblings, 0 replies; 27+ messages in thread
From: Nitesh Shetty @ 2023-07-27 19:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Martin K . Petersen,
	Jaegeuk Kim, Damien Le Moal, Ming Lei

On Thu, Jul 27, 2023 at 8:13 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 7/27/23 03:59, Nitesh Shetty wrote:
> > On Thu, Jul 27, 2023 at 1:39 AM Bart Van Assche <bvanassche@acm.org> wrote:
> >> + * Do not use the zone write lock for sequential writes for sequential write
> >> + * required zones.
> >> + */
> >> +#define QUEUE_FLAG_NO_ZONE_WRITE_LOCK 8
> >
> > Instead of QUEUE_FLAG_NO_ZONE_WRITE_LOCK I feel
> > QUEUE_FLAG_SKIP_ZONE_WRITE_LOCK is better.
>
> Hmm ... this patch series makes it possible to use an UFS storage
> controller and a zoned UFS device with no I/O scheduler attached. If no
> I/O scheduler is active, there is no zone write locking to skip so I
> think that the former name is better.

Okay, I missed no I/O scheduler.

Thanks,
Nitesh Shetty

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK
  2023-07-27  0:54   ` Chaitanya Kulkarni
@ 2023-07-27 20:20     ` Bart Van Assche
  2023-07-27 20:52       ` Chaitanya Kulkarni
  0 siblings, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2023-07-27 20:20 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Damien Le Moal, Ming Lei, Jens Axboe

On 7/26/23 17:54, Chaitanya Kulkarni wrote:
> On 7/26/2023 12:34 PM, Bart Van Assche wrote:
>> +/*
>> + * Do not use the zone write lock for sequential writes for sequential write
>> + * required zones.
>> + */
> 
> In first go I got little confused with above comment, how about :-
> 
> /*
>    * When issuing sequential writes on zone type
>    * BLK_ZONE_TYPE_SEQWRITE_REQ, don't use zone write locking.
>    */
> 
> It makes it easier to search in code with BLK_ZONE_TYPE_SEQWRITE_REQ
> but if everyone else is okay with original comment or if it is not
> correct suggestion feel free to ignore my suggestion ...

How about this comment?

/*
  * Do not serialize sequential writes sent to a sequential write
  * required zone (BLK_ZONE_TYPE_SEQWRITE_REQ).
  */

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 4/7] block/null_blk: Support disabling zone write locking
  2023-07-27  0:36   ` Chaitanya Kulkarni
@ 2023-07-27 20:26     ` Bart Van Assche
  0 siblings, 0 replies; 27+ messages in thread
From: Bart Van Assche @ 2023-07-27 20:26 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Damien Le Moal, Ming Lei, Jens Axboe, Johannes Thumshirn,
	Vincent Fu, Keith Busch, Akinobu Mita, Shin'ichiro Kawasaki

On 7/26/23 17:36, Chaitanya Kulkarni wrote:
> On 7/26/2023 12:34 PM, Bart Van Assche wrote:
>> Add a new configfs attribute for disabling zone write locking. Tests
>> show a performance of 250 K IOPS with no I/O scheduler, 6 K IOPS with
>> mq-deadline and write locking enabled and 123 K IOPS with mq-deadline
>> and write locking disabled. This shows that disabling write locking
>> results in about 20 times more IOPS for this particular test case.
>>
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Damien Le Moal <dlemoal@kernel.org>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
> 
> This patch looks good to me, perhaps a blktests would be nice in zbd
> category ?
> 
> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

Hi Chaitanya,

I will work on adding a blktest test that runs with zone write locking 
disabled.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK
  2023-07-27 20:20     ` Bart Van Assche
@ 2023-07-27 20:52       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 27+ messages in thread
From: Chaitanya Kulkarni @ 2023-07-27 20:52 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Damien Le Moal, Ming Lei, Jens Axboe

On 7/27/2023 1:20 PM, Bart Van Assche wrote:
> On 7/26/23 17:54, Chaitanya Kulkarni wrote:
>> On 7/26/2023 12:34 PM, Bart Van Assche wrote:
>>> +/*
>>> + * Do not use the zone write lock for sequential writes for 
>>> sequential write
>>> + * required zones.
>>> + */
>>
>> In first go I got little confused with above comment, how about :-
>>
>> /*
>>    * When issuing sequential writes on zone type
>>    * BLK_ZONE_TYPE_SEQWRITE_REQ, don't use zone write locking.
>>    */
>>
>> It makes it easier to search in code with BLK_ZONE_TYPE_SEQWRITE_REQ
>> but if everyone else is okay with original comment or if it is not
>> correct suggestion feel free to ignore my suggestion ...
> 
> How about this comment?
> 
> /*
>   * Do not serialize sequential writes sent to a sequential write
>   * required zone (BLK_ZONE_TYPE_SEQWRITE_REQ).
>   */
> 
> Thanks,
> 
> Bart.
> 

works for me ..

-ck



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 5/7] scsi: Retry unaligned zoned writes
  2023-07-27  0:33   ` Damien Le Moal
  2023-07-27  1:03     ` Bart Van Assche
@ 2023-07-27 20:57     ` Bart Van Assche
  1 sibling, 0 replies; 27+ messages in thread
From: Bart Van Assche @ 2023-07-27 20:57 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim,
	Ming Lei, James E.J. Bottomley

On 7/26/23 17:33, Damien Le Moal wrote:
> On 7/27/23 04:34, Bart Van Assche wrote:
>> +	if (blk_no_zone_write_lock(rq) && blk_rq_is_seq_zoned_write(rq))
>> +		cmd->allowed += rq->q->nr_requests;
> 
> Why += ? Shouldn't this be "=" ?
Hi Damien,

I think that the retry mechanism is also used to retry commands for 
which a unit attention was reported. Hence "+=" instead of "=".

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 6/7] scsi: ufs: Disable zone write locking
  2023-07-26 19:34 ` [PATCH v4 6/7] scsi: ufs: Disable zone write locking Bart Van Assche
@ 2023-07-27 21:55   ` kernel test robot
  0 siblings, 0 replies; 27+ messages in thread
From: kernel test robot @ 2023-07-27 21:55 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: oe-kbuild-all, linux-block, Christoph Hellwig,
	Martin K . Petersen, Jaegeuk Kim, Damien Le Moal,
	Bart Van Assche, Avri Altman, Ming Lei, James E.J. Bottomley,
	Stanley Chu, Can Guo, Asutosh Das, Bao D. Nguyen, Bean Huo,
	Arthur Simchaev

Hi Bart,

kernel test robot noticed the following build warnings:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v6.5-rc3]
[cannot apply to mkp-scsi/for-next jejb-scsi/for-next next-20230727]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Bart-Van-Assche/block-Introduce-the-flag-QUEUE_FLAG_NO_ZONE_WRITE_LOCK/20230727-033820
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link:    https://lore.kernel.org/r/20230726193440.1655149-7-bvanassche%40acm.org
patch subject: [PATCH v4 6/7] scsi: ufs: Disable zone write locking
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20230728/202307280537.bj45anmo-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230728/202307280537.bj45anmo-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202307280537.bj45anmo-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/ufs/core/ufshcd.c:4340:6: warning: no previous prototype for 'ufshcd_update_no_zone_write_lock' [-Wmissing-prototypes]
    4340 | void ufshcd_update_no_zone_write_lock(struct ufs_hba *hba,
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +/ufshcd_update_no_zone_write_lock +4340 drivers/ufs/core/ufshcd.c

  4339	
> 4340	void ufshcd_update_no_zone_write_lock(struct ufs_hba *hba,
  4341					      bool set_no_zone_write_lock)
  4342	{
  4343		struct scsi_device *sdev;
  4344	
  4345		shost_for_each_device(sdev, hba->host)
  4346			blk_freeze_queue_start(sdev->request_queue);
  4347		shost_for_each_device(sdev, hba->host) {
  4348			struct request_queue *q = sdev->request_queue;
  4349	
  4350			blk_mq_freeze_queue_wait(q);
  4351			if (set_no_zone_write_lock)
  4352				blk_queue_flag_set(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
  4353			else
  4354				blk_queue_flag_clear(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, q);
  4355			blk_mq_unfreeze_queue(q);
  4356		}
  4357	}
  4358	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-07-27 21:56 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-26 19:34 [PATCH v4 0/7] Improve the performance of F2FS on zoned UFS Bart Van Assche
2023-07-26 19:34 ` [PATCH v4 1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK Bart Van Assche
2023-07-26 23:49   ` Damien Le Moal
2023-07-27  0:54   ` Chaitanya Kulkarni
2023-07-27 20:20     ` Bart Van Assche
2023-07-27 20:52       ` Chaitanya Kulkarni
2023-07-27 10:59   ` Nitesh Shetty
2023-07-27 14:43     ` Bart Van Assche
2023-07-27 19:06       ` Nitesh Shetty
2023-07-26 19:34 ` [PATCH v4 2/7] block: Introduce the flag REQ_NO_ZONE_WRITE_LOCK Bart Van Assche
2023-07-27  0:03   ` Damien Le Moal
2023-07-26 19:34 ` [PATCH v4 3/7] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
2023-07-27  0:09   ` Damien Le Moal
2023-07-26 19:34 ` [PATCH v4 4/7] block/null_blk: Support disabling zone write locking Bart Van Assche
2023-07-27  0:04   ` Damien Le Moal
2023-07-27  0:36   ` Chaitanya Kulkarni
2023-07-27 20:26     ` Bart Van Assche
2023-07-27 19:02   ` Nitesh Shetty
2023-07-26 19:34 ` [PATCH v4 5/7] scsi: Retry unaligned zoned writes Bart Van Assche
2023-07-27  0:33   ` Damien Le Moal
2023-07-27  1:03     ` Bart Van Assche
2023-07-27  1:09       ` Damien Le Moal
2023-07-27 14:53         ` Bart Van Assche
2023-07-27 20:57     ` Bart Van Assche
2023-07-26 19:34 ` [PATCH v4 6/7] scsi: ufs: Disable zone write locking Bart Van Assche
2023-07-27 21:55   ` kernel test robot
2023-07-26 19:34 ` [PATCH v4 7/7] fs/f2fs: " Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.