All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Enable zoned write pipelining for UFS devices
@ 2023-01-09 23:27 Bart Van Assche
  2023-01-09 23:27 ` [PATCH 1/8] block: Document blk_queue_zone_is_seq() and blk_rq_zone_is_seq() Bart Van Assche
                   ` (7 more replies)
  0 siblings, 8 replies; 37+ messages in thread
From: Bart Van Assche @ 2023-01-09 23:27 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Jaegeuk Kim, Avri Altman, Bart Van Assche

Hi Jens,

This patch series improves write performance for zoned UFS devices that
implement the host-managed (sequential write required) model. Please consider
these patches for the next merge window.

Thanks,

Bart.

Bart Van Assche (8):
  block: Document blk_queue_zone_is_seq() and blk_rq_zone_is_seq()
  block: Introduce the blk_rq_is_seq_zone_write() function
  block: Introduce a request queue flag for pipelining zoned writes
  block/mq-deadline: Only use zone locking if necessary
  block/null_blk: Refactor null_queue_rq()
  block/null_blk: Add support for pipelining zoned writes
  scsi: Retry unaligned zoned writes
  scsi: ufs: Enable zoned write pipelining

 block/blk-zoned.c                 |  3 ++-
 block/mq-deadline.c               | 14 +++++++++-----
 drivers/block/null_blk/main.c     | 30 ++++++++++++++++++++----------
 drivers/block/null_blk/null_blk.h |  3 +++
 drivers/block/null_blk/zoned.c    | 13 ++++++++++++-
 drivers/scsi/scsi_error.c         |  7 +++++++
 drivers/scsi/sd.c                 |  3 +++
 drivers/ufs/core/ufshcd.c         |  1 +
 include/linux/blk-mq.h            | 30 ++++++++++++++++++++++++++++++
 include/linux/blkdev.h            | 16 ++++++++++++++++
 10 files changed, 103 insertions(+), 17 deletions(-)


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/8] block: Document blk_queue_zone_is_seq() and blk_rq_zone_is_seq()
  2023-01-09 23:27 [PATCH 0/8] Enable zoned write pipelining for UFS devices Bart Van Assche
@ 2023-01-09 23:27 ` Bart Van Assche
  2023-01-09 23:36   ` Damien Le Moal
  2023-01-09 23:27 ` [PATCH 2/8] block: Introduce the blk_rq_is_seq_zone_write() function Bart Van Assche
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2023-01-09 23:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Bart Van Assche, Damien Le Moal

Since it is nontrivial to figure out how disk_zone_is_seq() and
blk_rq_zone_is_seq() handle sequential write preferred zones, document
this.

Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 include/linux/blk-mq.h | 7 +++++++
 include/linux/blkdev.h | 9 +++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 779fba613bd0..6735db1ad24d 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -1155,6 +1155,13 @@ static inline unsigned int blk_rq_zone_no(struct request *rq)
 	return disk_zone_no(rq->q->disk, blk_rq_pos(rq));
 }
 
+/**
+ * blk_rq_zone_is_seq() - Whether a request is for a sequential zone.
+ * @rq: Request pointer.
+ *
+ * Return: true if and only if blk_rq_pos(@rq) refers either to a sequential
+ * write required or to a sequential write preferred zone.
+ */
 static inline unsigned int blk_rq_zone_is_seq(struct request *rq)
 {
 	return disk_zone_is_seq(rq->q->disk, blk_rq_pos(rq));
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b87ed829ab94..ef93e848b1fd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -672,6 +672,15 @@ static inline unsigned int disk_zone_no(struct gendisk *disk, sector_t sector)
 	return sector >> ilog2(disk->queue->limits.chunk_sectors);
 }
 
+/**
+ * disk_zone_is_seq() - Whether a logical block is in a sequential zone.
+ * @disk: Disk pointer.
+ * @sector: Offset from start of block device in 512 byte units.
+ *
+ * Return: true if and only if @disk refers to a zoned block device and
+ * @sector refers either to a sequential write required or a sequential
+ * write preferred zone.
+ */
 static inline bool disk_zone_is_seq(struct gendisk *disk, sector_t sector)
 {
 	if (!blk_queue_is_zoned(disk->queue))

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/8] block: Introduce the blk_rq_is_seq_zone_write() function
  2023-01-09 23:27 [PATCH 0/8] Enable zoned write pipelining for UFS devices Bart Van Assche
  2023-01-09 23:27 ` [PATCH 1/8] block: Document blk_queue_zone_is_seq() and blk_rq_zone_is_seq() Bart Van Assche
@ 2023-01-09 23:27 ` Bart Van Assche
  2023-01-09 23:38   ` Damien Le Moal
  2023-01-09 23:27 ` [PATCH 3/8] block: Introduce a request queue flag for pipelining zoned writes Bart Van Assche
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2023-01-09 23:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Bart Van Assche, Damien Le Moal

Introduce a function that makes it easy to verify whether a write
request is for a sequential write required or sequential write preferred
zone.

Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 include/linux/blk-mq.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 6735db1ad24d..29f834e5c3a9 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -1167,6 +1167,24 @@ static inline unsigned int blk_rq_zone_is_seq(struct request *rq)
 	return disk_zone_is_seq(rq->q->disk, blk_rq_pos(rq));
 }
 
+/**
+ * blk_rq_is_seq_zone_write() - Whether @rq is a write request for a sequential zone.
+ * @rq: Request to examine.
+ *
+ * In this context sequential zone means either a sequential write required or
+ * to a sequential write preferred zone.
+ */
+static inline bool blk_rq_is_seq_zone_write(struct request *rq)
+{
+	switch (req_op(rq)) {
+	case REQ_OP_WRITE:
+	case REQ_OP_WRITE_ZEROES:
+		return blk_rq_zone_is_seq(rq);
+	default:
+		return false;
+	}
+}
+
 bool blk_req_needs_zone_write_lock(struct request *rq);
 bool blk_req_zone_write_trylock(struct request *rq);
 void __blk_req_zone_write_lock(struct request *rq);
@@ -1197,6 +1215,11 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq)
 	return !blk_req_zone_is_write_locked(rq);
 }
 #else /* CONFIG_BLK_DEV_ZONED */
+static inline bool blk_rq_is_seq_zone_write(struct request *rq)
+{
+	return false;
+}
+
 static inline bool blk_req_needs_zone_write_lock(struct request *rq)
 {
 	return false;

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/8] block: Introduce a request queue flag for pipelining zoned writes
  2023-01-09 23:27 [PATCH 0/8] Enable zoned write pipelining for UFS devices Bart Van Assche
  2023-01-09 23:27 ` [PATCH 1/8] block: Document blk_queue_zone_is_seq() and blk_rq_zone_is_seq() Bart Van Assche
  2023-01-09 23:27 ` [PATCH 2/8] block: Introduce the blk_rq_is_seq_zone_write() function Bart Van Assche
@ 2023-01-09 23:27 ` Bart Van Assche
  2023-01-09 23:27 ` [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 37+ messages in thread
From: Bart Van Assche @ 2023-01-09 23:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Bart Van Assche, Damien Le Moal

Writes in sequential write required zones must happen at the write
pointer. Even if the submitter of the write commands (e.g. a filesystem)
submits writes for sequential write required zones in order, the block
layer or the storage controller may reorder these write commands.

The zone locking mechanism in the mq-deadline I/O scheduler serializes
write commands for sequential zones. Some but not all storage controllers
require this serialization. Introduce a new flag such that block drivers
can request pipelining of writes for sequential write required zones.

An example of a storage controller standard that requires write
serialization is AHCI (Advanced Host Controller Interface). Submitting
commands to AHCI controllers happens by writing a bit pattern into a
register. Each set bit corresponds to an active command. This mechanism
does not preserve command ordering information.

Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 include/linux/blkdev.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ef93e848b1fd..b4b4cd3f2912 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -553,6 +553,8 @@ struct request_queue {
 #define QUEUE_FLAG_NONROT	6	/* non-rotational device (SSD) */
 #define QUEUE_FLAG_VIRT		QUEUE_FLAG_NONROT /* paravirt device */
 #define QUEUE_FLAG_IO_STAT	7	/* do disk/partitions IO accounting */
+/* Writes for sequential write required zones may be pipelined. */
+#define QUEUE_FLAG_PIPELINE_ZONED_WRITES 8
 #define QUEUE_FLAG_NOXMERGES	9	/* No extended merges */
 #define QUEUE_FLAG_ADD_RANDOM	10	/* Contributes to random pool */
 #define QUEUE_FLAG_SAME_FORCE	12	/* force complete on same CPU */
@@ -614,6 +616,11 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_skip_tagset_quiesce(q) \
 	test_bit(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, &(q)->queue_flags)
 
+static inline bool blk_queue_pipeline_zoned_writes(struct request_queue *q)
+{
+	return test_bit(QUEUE_FLAG_PIPELINE_ZONED_WRITES, &q->queue_flags);
+}
+
 extern void blk_set_pm_only(struct request_queue *q);
 extern void blk_clear_pm_only(struct request_queue *q);
 

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-09 23:27 [PATCH 0/8] Enable zoned write pipelining for UFS devices Bart Van Assche
                   ` (2 preceding siblings ...)
  2023-01-09 23:27 ` [PATCH 3/8] block: Introduce a request queue flag for pipelining zoned writes Bart Van Assche
@ 2023-01-09 23:27 ` Bart Van Assche
  2023-01-09 23:46   ` Damien Le Moal
  2023-01-09 23:27 ` [PATCH 5/8] block/null_blk: Refactor null_queue_rq() Bart Van Assche
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2023-01-09 23:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Bart Van Assche, Damien Le Moal

Measurements have shown that limiting the queue depth to one for zoned
writes has a significant negative performance impact on zoned UFS devices.
Hence this patch that disables zone locking from the mq-deadline scheduler
for storage controllers that support pipelining zoned writes. This patch is
based on the following assumptions:
- Applications submit write requests to sequential write required zones
  in order.
- It happens infrequently that zoned write requests are reordered by the
  block layer.
- The storage controller does not reorder write requests that have been
  submitted to the same hardware queue. This is the case for UFS: the
  UFSHCI specification requires that UFS controllers process requests in
  order per hardware queue.
- The I/O priority of all pipelined write requests is the same per zone.
- Either no I/O scheduler is used or an I/O scheduler is used that
  submits write requests per zone in LBA order.

If applications submit write requests to sequential write required zones
in order, at least one of the pending requests will succeed. Hence, the
number of retries that is required is at most (number of pending
requests) - 1.

Cc: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-zoned.c   |  3 ++-
 block/mq-deadline.c | 14 +++++++++-----
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index db829401d8d0..158638091e39 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -520,7 +520,8 @@ static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx,
 		break;
 	case BLK_ZONE_TYPE_SEQWRITE_REQ:
 	case BLK_ZONE_TYPE_SEQWRITE_PREF:
-		if (!args->seq_zones_wlock) {
+		if (!blk_queue_pipeline_zoned_writes(q) &&
+		    !args->seq_zones_wlock) {
 			args->seq_zones_wlock =
 				blk_alloc_zone_bitmap(q->node, args->nr_zones);
 			if (!args->seq_zones_wlock)
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index f10c2a0d18d4..e41808c0b007 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -339,7 +339,8 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 		return NULL;
 
 	rq = rq_entry_fifo(per_prio->fifo_list[data_dir].next);
-	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
+	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q) ||
+	    blk_queue_pipeline_zoned_writes(rq->q))
 		return rq;
 
 	/*
@@ -378,7 +379,8 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 	if (!rq)
 		return NULL;
 
-	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
+	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q) ||
+	    blk_queue_pipeline_zoned_writes(rq->q))
 		return rq;
 
 	/*
@@ -503,8 +505,9 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
 	}
 
 	/*
-	 * For a zoned block device, if we only have writes queued and none of
-	 * them can be dispatched, rq will be NULL.
+	 * For a zoned block device that requires write serialization, if we
+	 * only have writes queued and none of them can be dispatched, rq will
+	 * be NULL.
 	 */
 	if (!rq)
 		return NULL;
@@ -893,7 +896,8 @@ static void dd_finish_request(struct request *rq)
 
 	atomic_inc(&per_prio->stats.completed);
 
-	if (blk_queue_is_zoned(q)) {
+	if (blk_queue_is_zoned(rq->q) &&
+	    !blk_queue_pipeline_zoned_writes(q)) {
 		unsigned long flags;
 
 		spin_lock_irqsave(&dd->zone_lock, flags);

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 5/8] block/null_blk: Refactor null_queue_rq()
  2023-01-09 23:27 [PATCH 0/8] Enable zoned write pipelining for UFS devices Bart Van Assche
                   ` (3 preceding siblings ...)
  2023-01-09 23:27 ` [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
@ 2023-01-09 23:27 ` Bart Van Assche
  2023-01-09 23:27 ` [PATCH 6/8] block/null_blk: Add support for pipelining zoned writes Bart Van Assche
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 37+ messages in thread
From: Bart Van Assche @ 2023-01-09 23:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Bart Van Assche, Damien Le Moal

Introduce a local variable for the expression bd->rq since that expression
occurs multiple times. This patch does not change any functionality.

Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/block/null_blk/main.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 4c601ca9552a..d401230b1e20 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -1660,10 +1660,11 @@ static enum blk_eh_timer_return null_timeout_rq(struct request *rq)
 static blk_status_t null_queue_rq(struct blk_mq_hw_ctx *hctx,
 			 const struct blk_mq_queue_data *bd)
 {
-	struct nullb_cmd *cmd = blk_mq_rq_to_pdu(bd->rq);
+	struct request *rq = bd->rq;
+	struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq);
 	struct nullb_queue *nq = hctx->driver_data;
-	sector_t nr_sectors = blk_rq_sectors(bd->rq);
-	sector_t sector = blk_rq_pos(bd->rq);
+	sector_t nr_sectors = blk_rq_sectors(rq);
+	sector_t sector = blk_rq_pos(rq);
 	const bool is_poll = hctx->type == HCTX_TYPE_POLL;
 
 	might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING);
@@ -1672,14 +1673,14 @@ static blk_status_t null_queue_rq(struct blk_mq_hw_ctx *hctx,
 		hrtimer_init(&cmd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 		cmd->timer.function = null_cmd_timer_expired;
 	}
-	cmd->rq = bd->rq;
+	cmd->rq = rq;
 	cmd->error = BLK_STS_OK;
 	cmd->nq = nq;
-	cmd->fake_timeout = should_timeout_request(bd->rq);
+	cmd->fake_timeout = should_timeout_request(rq);
 
-	blk_mq_start_request(bd->rq);
+	blk_mq_start_request(rq);
 
-	if (should_requeue_request(bd->rq)) {
+	if (should_requeue_request(rq)) {
 		/*
 		 * Alternate between hitting the core BUSY path, and the
 		 * driver driven requeue path
@@ -1688,21 +1689,21 @@ static blk_status_t null_queue_rq(struct blk_mq_hw_ctx *hctx,
 		if (nq->requeue_selection & 1)
 			return BLK_STS_RESOURCE;
 		else {
-			blk_mq_requeue_request(bd->rq, true);
+			blk_mq_requeue_request(rq, true);
 			return BLK_STS_OK;
 		}
 	}
 
 	if (is_poll) {
 		spin_lock(&nq->poll_lock);
-		list_add_tail(&bd->rq->queuelist, &nq->poll_list);
+		list_add_tail(&rq->queuelist, &nq->poll_list);
 		spin_unlock(&nq->poll_lock);
 		return BLK_STS_OK;
 	}
 	if (cmd->fake_timeout)
 		return BLK_STS_OK;
 
-	return null_handle_cmd(cmd, sector, nr_sectors, req_op(bd->rq));
+	return null_handle_cmd(cmd, sector, nr_sectors, req_op(rq));
 }
 
 static void cleanup_queue(struct nullb_queue *nq)

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 6/8] block/null_blk: Add support for pipelining zoned writes
  2023-01-09 23:27 [PATCH 0/8] Enable zoned write pipelining for UFS devices Bart Van Assche
                   ` (4 preceding siblings ...)
  2023-01-09 23:27 ` [PATCH 5/8] block/null_blk: Refactor null_queue_rq() Bart Van Assche
@ 2023-01-09 23:27 ` Bart Van Assche
  2023-01-09 23:27 ` [PATCH 7/8] scsi: Retry unaligned " Bart Van Assche
  2023-01-09 23:27 ` [PATCH 8/8] scsi: ufs: Enable zoned write pipelining Bart Van Assche
  7 siblings, 0 replies; 37+ messages in thread
From: Bart Van Assche @ 2023-01-09 23:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Bart Van Assche, Damien Le Moal

Add a new configfs attribute for enabling pipelining of zoned writes. If
that attribute has been set, retry zoned writes that are not aligned with
the write pointer. The test script below reports 234 K IOPS with no I/O
scheduler, 5.32 K IOPS with mq-deadline and pipelining disabled and 92.2 K
IOPS with mq-deadline and pipelining enabled. This shows that pipelining
results in about 17 times more IOPS for this particular test case.

    #!/bin/bash

    for mode in "none 0" "mq-deadline 0" "mq-deadline 1"; do
        set +e
        for d in /sys/kernel/config/nullb/*; do
            [ -d "$d" ] && rmdir "$d"
        done
        modprobe -r null_blk
        set -e
        read -r iosched pipelining <<<"$mode"
        modprobe null_blk nr_devices=0
        (
            cd /sys/kernel/config/nullb
            mkdir nullb0
            cd nullb0
            params=(
                completion_nsec=100000
                hw_queue_depth=64
                irqmode=2                # NULL_IRQ_TIMER
                max_sectors=$((4096/512))
                memory_backed=1
                pipeline_zoned_writes="${pipelining}"
                size=1
                submit_queues=1
                zone_size=1
                zoned=1
                power=1
            )
            for p in "${params[@]}"; do
                echo "${p//*=}" > "${p//=*}"
            done
        )
        udevadm settle
        dev=/dev/nullb0
        [ -b "${dev}" ]
        params=(
            --direct=1
            --filename="${dev}"
            --iodepth=64
            --iodepth_batch=16
            --ioengine=io_uring
            --ioscheduler="${iosched}"
            --gtod_reduce=1
            --hipri=0
            --name=nullb0
            --runtime=30
            --rw=write
            --time_based=1
            --zonemode=zbd
        )
        fio "${params[@]}"
    done

Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/block/null_blk/main.c     |  9 +++++++++
 drivers/block/null_blk/null_blk.h |  3 +++
 drivers/block/null_blk/zoned.c    | 13 ++++++++++++-
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index d401230b1e20..851b55b7284f 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -424,6 +424,7 @@ NULLB_DEVICE_ATTR(zone_capacity, ulong, NULL);
 NULLB_DEVICE_ATTR(zone_nr_conv, uint, NULL);
 NULLB_DEVICE_ATTR(zone_max_open, uint, NULL);
 NULLB_DEVICE_ATTR(zone_max_active, uint, NULL);
+NULLB_DEVICE_ATTR(pipeline_zoned_writes, bool, NULL);
 NULLB_DEVICE_ATTR(virt_boundary, bool, NULL);
 NULLB_DEVICE_ATTR(no_sched, bool, NULL);
 NULLB_DEVICE_ATTR(shared_tag_bitmap, bool, NULL);
@@ -569,6 +570,7 @@ static struct configfs_attribute *nullb_device_attrs[] = {
 	&nullb_device_attr_zone_max_active,
 	&nullb_device_attr_zone_readonly,
 	&nullb_device_attr_zone_offline,
+	&nullb_device_attr_pipeline_zoned_writes,
 	&nullb_device_attr_virt_boundary,
 	&nullb_device_attr_no_sched,
 	&nullb_device_attr_shared_tag_bitmap,
@@ -1677,6 +1679,11 @@ static blk_status_t null_queue_rq(struct blk_mq_hw_ctx *hctx,
 	cmd->error = BLK_STS_OK;
 	cmd->nq = nq;
 	cmd->fake_timeout = should_timeout_request(rq);
+	if (!(rq->rq_flags & RQF_DONTPREP)) {
+		rq->rq_flags |= RQF_DONTPREP;
+		cmd->retries = 0;
+		cmd->max_attempts = rq->q->nr_requests;
+	}
 
 	blk_mq_start_request(rq);
 
@@ -2109,6 +2116,8 @@ static int null_add_dev(struct nullb_device *dev)
 	nullb->q->queuedata = nullb;
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q);
 	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, nullb->q);
+	if (dev->pipeline_zoned_writes)
+		blk_queue_flag_set(QUEUE_FLAG_PIPELINE_ZONED_WRITES, nullb->q);
 
 	mutex_lock(&lock);
 	rv = ida_simple_get(&nullb_indexes, 0, 0, GFP_KERNEL);
diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
index eb5972c50be8..c44c3fdb1025 100644
--- a/drivers/block/null_blk/null_blk.h
+++ b/drivers/block/null_blk/null_blk.h
@@ -23,6 +23,8 @@ struct nullb_cmd {
 	unsigned int tag;
 	blk_status_t error;
 	bool fake_timeout;
+	u16 retries;
+	u16 max_attempts;
 	struct nullb_queue *nq;
 	struct hrtimer timer;
 };
@@ -112,6 +114,7 @@ struct nullb_device {
 	bool memory_backed; /* if data is stored in memory */
 	bool discard; /* if support discard */
 	bool zoned; /* if device is zoned */
+	bool pipeline_zoned_writes;
 	bool virt_boundary; /* virtual boundary on/off for the device */
 	bool no_sched; /* no IO scheduler for the device */
 	bool shared_tag_bitmap; /* use hostwide shared tags */
diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
index 635ce0648133..cffc4985a7df 100644
--- a/drivers/block/null_blk/zoned.c
+++ b/drivers/block/null_blk/zoned.c
@@ -405,7 +405,18 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
 		else
 			cmd->bio->bi_iter.bi_sector = sector;
 	} else if (sector != zone->wp) {
-		ret = BLK_STS_IOERR;
+		/*
+		 * In case of a misaligned write and if pipelining of zoned
+		 * writes has been enabled, request the block layer to retry
+		 * until the maximum number of attempts has been reached. If
+		 * the maximum number of attempts has been reached, fail the
+		 * misaligned write.
+		 */
+		if (dev->pipeline_zoned_writes &&
+		    ++cmd->retries < cmd->max_attempts)
+			ret = BLK_STS_DEV_RESOURCE;
+		else
+			ret = BLK_STS_IOERR;
 		goto unlock;
 	}
 

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 7/8] scsi: Retry unaligned zoned writes
  2023-01-09 23:27 [PATCH 0/8] Enable zoned write pipelining for UFS devices Bart Van Assche
                   ` (5 preceding siblings ...)
  2023-01-09 23:27 ` [PATCH 6/8] block/null_blk: Add support for pipelining zoned writes Bart Van Assche
@ 2023-01-09 23:27 ` Bart Van Assche
  2023-01-09 23:51   ` Damien Le Moal
  2023-01-09 23:27 ` [PATCH 8/8] scsi: ufs: Enable zoned write pipelining Bart Van Assche
  7 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2023-01-09 23:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Bart Van Assche,
	Martin K . Petersen, Damien Le Moal

From ZBC-2: "The device server terminates with CHECK CONDITION status, with
the sense key set to ILLEGAL REQUEST, and the additional sense code set to
UNALIGNED WRITE COMMAND a write command, other than an entire medium write
same command, that specifies: a) the starting LBA in a sequential write
required zone set to a value that is not equal to the write pointer for that
sequential write required zone; or b) an ending LBA that is not equal to the
last logical block within a physical block (see SBC-5)."

I am not aware of any other conditions that may trigger the UNALIGNED
WRITE COMMAND response.

Retry unaligned writes in preparation of removing zone locking.

Increase the number of retries for write commands sent to a sequential
zone to the maximum number of outstanding commands.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_error.c | 7 +++++++
 drivers/scsi/sd.c         | 3 +++
 2 files changed, 10 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index a7960ad2d386..e17f36b0b18a 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -676,6 +676,13 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
 		fallthrough;
 
 	case ILLEGAL_REQUEST:
+		/*
+		 * Unaligned write command. Retry immediately if pipelining
+		 * zoned writes has been enabled.
+		 */
+		if (sshdr.asc == 0x21 && sshdr.ascq == 0x04 &&
+		    blk_queue_pipeline_zoned_writes(sdev->request_queue))
+			return NEEDS_RETRY;
 		if (sshdr.asc == 0x20 || /* Invalid command operation code */
 		    sshdr.asc == 0x21 || /* Logical block address out of range */
 		    sshdr.asc == 0x22 || /* Invalid function */
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 47dafe6b8a66..cd90b54a6597 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1207,6 +1207,9 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
 	cmd->transfersize = sdp->sector_size;
 	cmd->underflow = nr_blocks << 9;
 	cmd->allowed = sdkp->max_retries;
+	if (blk_queue_pipeline_zoned_writes(rq->q) &&
+	    blk_rq_is_seq_zone_write(rq))
+		cmd->allowed += rq->q->nr_requests;
 	cmd->sdb.length = nr_blocks * sdp->sector_size;
 
 	SCSI_LOG_HLQUEUE(1,

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 8/8] scsi: ufs: Enable zoned write pipelining
  2023-01-09 23:27 [PATCH 0/8] Enable zoned write pipelining for UFS devices Bart Van Assche
                   ` (6 preceding siblings ...)
  2023-01-09 23:27 ` [PATCH 7/8] scsi: Retry unaligned " Bart Van Assche
@ 2023-01-09 23:27 ` Bart Van Assche
  2023-01-10  9:16   ` Avri Altman
  2023-01-10 12:23   ` Bean Huo
  7 siblings, 2 replies; 37+ messages in thread
From: Bart Van Assche @ 2023-01-09 23:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Bart Van Assche,
	Martin K . Petersen, Damien Le Moal

From the UFSHCI 4.0 specification, about the legacy (single queue) mode:
"The host controller always process transfer requests in-order according
to the order submitted to the list. In case of multiple commands with
single doorbell register ringing (batch mode), The dispatch order for
these transfer requests by host controller will base on their index in
the List. A transfer request with lower index value will be executed
before a transfer request with higher index value."

From the UFSHCI 4.0 specification, about the MCQ mode:
"Command Submission
1. Host SW writes an Entry to SQ
2. Host SW updates SQ doorbell tail pointer

Command Processing
3. After fetching the Entry, Host Controller updates SQ doorbell head
   pointer
4. Host controller sends COMMAND UPIU to UFS device"

In other words, for both legacy and MCQ mode, UFS controllers are
required to forward commands to the UFS device in the order these
commands have been received from the host.

Note: for legacy mode this is only correct if the host submits one
command at a time. The UFS driver does this.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index e18c9f4463ec..9198505e953b 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -5031,6 +5031,7 @@ static int ufshcd_slave_configure(struct scsi_device *sdev)
 
 	ufshcd_hpb_configure(hba, sdev);
 
+	blk_queue_flag_set(QUEUE_FLAG_PIPELINE_ZONED_WRITES, q);
 	blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1);
 	if (hba->quirks & UFSHCD_QUIRK_ALIGN_SG_WITH_PAGE_SIZE)
 		blk_queue_update_dma_alignment(q, PAGE_SIZE - 1);

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/8] block: Document blk_queue_zone_is_seq() and blk_rq_zone_is_seq()
  2023-01-09 23:27 ` [PATCH 1/8] block: Document blk_queue_zone_is_seq() and blk_rq_zone_is_seq() Bart Van Assche
@ 2023-01-09 23:36   ` Damien Le Moal
  0 siblings, 0 replies; 37+ messages in thread
From: Damien Le Moal @ 2023-01-09 23:36 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe; +Cc: linux-block, Jaegeuk Kim, Avri Altman

On 1/10/23 08:27, Bart Van Assche wrote:
> Since it is nontrivial to figure out how disk_zone_is_seq() and
> blk_rq_zone_is_seq() handle sequential write preferred zones, document
> this.
> 
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  include/linux/blk-mq.h | 7 +++++++
>  include/linux/blkdev.h | 9 +++++++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 779fba613bd0..6735db1ad24d 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -1155,6 +1155,13 @@ static inline unsigned int blk_rq_zone_no(struct request *rq)
>  	return disk_zone_no(rq->q->disk, blk_rq_pos(rq));
>  }
>  
> +/**
> + * blk_rq_zone_is_seq() - Whether a request is for a sequential zone.
> + * @rq: Request pointer.
> + *
> + * Return: true if and only if blk_rq_pos(@rq) refers either to a sequential
> + * write required or to a sequential write preferred zone.

May be change this to "...blk_rq_pos(@rq) is targeting either a sequential
write required zone or a sequential write preferred zone."

> + */
>  static inline unsigned int blk_rq_zone_is_seq(struct request *rq)
>  {
>  	return disk_zone_is_seq(rq->q->disk, blk_rq_pos(rq));
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index b87ed829ab94..ef93e848b1fd 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -672,6 +672,15 @@ static inline unsigned int disk_zone_no(struct gendisk *disk, sector_t sector)
>  	return sector >> ilog2(disk->queue->limits.chunk_sectors);
>  }
>  
> +/**
> + * disk_zone_is_seq() - Whether a logical block is in a sequential zone.
> + * @disk: Disk pointer.
> + * @sector: Offset from start of block device in 512 byte units.
> + *
> + * Return: true if and only if @disk refers to a zoned block device and
> + * @sector refers either to a sequential write required or a sequential
> + * write preferred zone.

May be change this to "...and @sector is contained either in a sequential
write required zone or a sequential write preferred zone."

> + */
>  static inline bool disk_zone_is_seq(struct gendisk *disk, sector_t sector)
>  {
>  	if (!blk_queue_is_zoned(disk->queue))

With that fixed,

Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/8] block: Introduce the blk_rq_is_seq_zone_write() function
  2023-01-09 23:27 ` [PATCH 2/8] block: Introduce the blk_rq_is_seq_zone_write() function Bart Van Assche
@ 2023-01-09 23:38   ` Damien Le Moal
  2023-01-09 23:52     ` Bart Van Assche
  0 siblings, 1 reply; 37+ messages in thread
From: Damien Le Moal @ 2023-01-09 23:38 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe; +Cc: linux-block, Jaegeuk Kim, Avri Altman

On 1/10/23 08:27, Bart Van Assche wrote:
> Introduce a function that makes it easy to verify whether a write
> request is for a sequential write required or sequential write preferred
> zone.
> 
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  include/linux/blk-mq.h | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 6735db1ad24d..29f834e5c3a9 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -1167,6 +1167,24 @@ static inline unsigned int blk_rq_zone_is_seq(struct request *rq)
>  	return disk_zone_is_seq(rq->q->disk, blk_rq_pos(rq));
>  }
>  
> +/**
> + * blk_rq_is_seq_zone_write() - Whether @rq is a write request for a sequential zone.
> + * @rq: Request to examine.
> + *
> + * In this context sequential zone means either a sequential write required or
> + * to a sequential write preferred zone.

...means either a sequential write required zone or a sequential write
preferred zone.

> + */
> +static inline bool blk_rq_is_seq_zone_write(struct request *rq)
> +{
> +	switch (req_op(rq)) {
> +	case REQ_OP_WRITE:
> +	case REQ_OP_WRITE_ZEROES:

REQ_OP_ZONE_APPEND ?

> +		return blk_rq_zone_is_seq(rq);
> +	default:
> +		return false;
> +	}
> +}
> +
>  bool blk_req_needs_zone_write_lock(struct request *rq);
>  bool blk_req_zone_write_trylock(struct request *rq);
>  void __blk_req_zone_write_lock(struct request *rq);
> @@ -1197,6 +1215,11 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq)
>  	return !blk_req_zone_is_write_locked(rq);
>  }
>  #else /* CONFIG_BLK_DEV_ZONED */
> +static inline bool blk_rq_is_seq_zone_write(struct request *rq)
> +{
> +	return false;
> +}
> +
>  static inline bool blk_req_needs_zone_write_lock(struct request *rq)
>  {
>  	return false;

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-09 23:27 ` [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
@ 2023-01-09 23:46   ` Damien Le Moal
  2023-01-09 23:51     ` Bart Van Assche
  0 siblings, 1 reply; 37+ messages in thread
From: Damien Le Moal @ 2023-01-09 23:46 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Damien Le Moal

On 1/10/23 08:27, Bart Van Assche wrote:
> Measurements have shown that limiting the queue depth to one for zoned
> writes has a significant negative performance impact on zoned UFS devices.
> Hence this patch that disables zone locking from the mq-deadline scheduler
> for storage controllers that support pipelining zoned writes. This patch is
> based on the following assumptions:
> - Applications submit write requests to sequential write required zones
>   in order.
> - It happens infrequently that zoned write requests are reordered by the
>   block layer.
> - The storage controller does not reorder write requests that have been
>   submitted to the same hardware queue. This is the case for UFS: the
>   UFSHCI specification requires that UFS controllers process requests in
>   order per hardware queue.
> - The I/O priority of all pipelined write requests is the same per zone.
> - Either no I/O scheduler is used or an I/O scheduler is used that
>   submits write requests per zone in LBA order.
> 
> If applications submit write requests to sequential write required zones
> in order, at least one of the pending requests will succeed. Hence, the
> number of retries that is required is at most (number of pending
> requests) - 1.

But if the mid-layer decides to requeue a write request, the workqueue
used in the mq block layer for requeuing is going to completely destroy
write ordering as that is outside of the submission path, working in
parallel with it... Does blk_queue_pipeline_zoned_writes() == true also
guarantee that a write request will *never* be requeued before hitting the
adapter/device ?

> 
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-zoned.c   |  3 ++-
>  block/mq-deadline.c | 14 +++++++++-----
>  2 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index db829401d8d0..158638091e39 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -520,7 +520,8 @@ static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx,
>  		break;
>  	case BLK_ZONE_TYPE_SEQWRITE_REQ:
>  	case BLK_ZONE_TYPE_SEQWRITE_PREF:
> -		if (!args->seq_zones_wlock) {
> +		if (!blk_queue_pipeline_zoned_writes(q) &&
> +		    !args->seq_zones_wlock) {
>  			args->seq_zones_wlock =
>  				blk_alloc_zone_bitmap(q->node, args->nr_zones);
>  			if (!args->seq_zones_wlock)
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index f10c2a0d18d4..e41808c0b007 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -339,7 +339,8 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>  		return NULL;
>  
>  	rq = rq_entry_fifo(per_prio->fifo_list[data_dir].next);
> -	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
> +	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q) ||
> +	    blk_queue_pipeline_zoned_writes(rq->q))
>  		return rq;
>  
>  	/*
> @@ -378,7 +379,8 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>  	if (!rq)
>  		return NULL;
>  
> -	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q))
> +	if (data_dir == DD_READ || !blk_queue_is_zoned(rq->q) ||
> +	    blk_queue_pipeline_zoned_writes(rq->q))
>  		return rq;
>  
>  	/*
> @@ -503,8 +505,9 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
>  	}
>  
>  	/*
> -	 * For a zoned block device, if we only have writes queued and none of
> -	 * them can be dispatched, rq will be NULL.
> +	 * For a zoned block device that requires write serialization, if we
> +	 * only have writes queued and none of them can be dispatched, rq will
> +	 * be NULL.
>  	 */
>  	if (!rq)
>  		return NULL;
> @@ -893,7 +896,8 @@ static void dd_finish_request(struct request *rq)
>  
>  	atomic_inc(&per_prio->stats.completed);
>  
> -	if (blk_queue_is_zoned(q)) {
> +	if (blk_queue_is_zoned(rq->q) &&
> +	    !blk_queue_pipeline_zoned_writes(q)) {
>  		unsigned long flags;
>  
>  		spin_lock_irqsave(&dd->zone_lock, flags);

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-09 23:46   ` Damien Le Moal
@ 2023-01-09 23:51     ` Bart Van Assche
  2023-01-09 23:56       ` Damien Le Moal
  0 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2023-01-09 23:51 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Damien Le Moal

On 1/9/23 15:46, Damien Le Moal wrote:
> On 1/10/23 08:27, Bart Van Assche wrote:
>> Measurements have shown that limiting the queue depth to one for zoned
>> writes has a significant negative performance impact on zoned UFS devices.
>> Hence this patch that disables zone locking from the mq-deadline scheduler
>> for storage controllers that support pipelining zoned writes. This patch is
>> based on the following assumptions:
>> - Applications submit write requests to sequential write required zones
>>    in order.
>> - It happens infrequently that zoned write requests are reordered by the
>>    block layer.
>> - The storage controller does not reorder write requests that have been
>>    submitted to the same hardware queue. This is the case for UFS: the
>>    UFSHCI specification requires that UFS controllers process requests in
>>    order per hardware queue.
>> - The I/O priority of all pipelined write requests is the same per zone.
>> - Either no I/O scheduler is used or an I/O scheduler is used that
>>    submits write requests per zone in LBA order.
>>
>> If applications submit write requests to sequential write required zones
>> in order, at least one of the pending requests will succeed. Hence, the
>> number of retries that is required is at most (number of pending
>> requests) - 1.
> 
> But if the mid-layer decides to requeue a write request, the workqueue
> used in the mq block layer for requeuing is going to completely destroy
> write ordering as that is outside of the submission path, working in
> parallel with it... Does blk_queue_pipeline_zoned_writes() == true also
> guarantee that a write request will *never* be requeued before hitting the
> adapter/device ?

We don't need the guarantee that reordering will never happen. What we 
need is that reordering happens infrequently (e.g. less than 1% of the 
cases). This is what the last paragraph before your reply refers to. 
Maybe I should expand that paragraph.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 7/8] scsi: Retry unaligned zoned writes
  2023-01-09 23:27 ` [PATCH 7/8] scsi: Retry unaligned " Bart Van Assche
@ 2023-01-09 23:51   ` Damien Le Moal
  2023-01-09 23:55     ` Bart Van Assche
  0 siblings, 1 reply; 37+ messages in thread
From: Damien Le Moal @ 2023-01-09 23:51 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Martin K . Petersen

On 1/10/23 08:27, Bart Van Assche wrote:
> From ZBC-2: "The device server terminates with CHECK CONDITION status, with
> the sense key set to ILLEGAL REQUEST, and the additional sense code set to
> UNALIGNED WRITE COMMAND a write command, other than an entire medium write
> same command, that specifies: a) the starting LBA in a sequential write
> required zone set to a value that is not equal to the write pointer for that
> sequential write required zone; or b) an ending LBA that is not equal to the
> last logical block within a physical block (see SBC-5)."
> 
> I am not aware of any other conditions that may trigger the UNALIGNED
> WRITE COMMAND response.
> 
> Retry unaligned writes in preparation of removing zone locking.
> 
> Increase the number of retries for write commands sent to a sequential
> zone to the maximum number of outstanding commands.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/scsi_error.c | 7 +++++++
>  drivers/scsi/sd.c         | 3 +++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index a7960ad2d386..e17f36b0b18a 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -676,6 +676,13 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>  		fallthrough;
>  
>  	case ILLEGAL_REQUEST:
> +		/*
> +		 * Unaligned write command. Retry immediately if pipelining
> +		 * zoned writes has been enabled.
> +		 */
> +		if (sshdr.asc == 0x21 && sshdr.ascq == 0x04 &&
> +		    blk_queue_pipeline_zoned_writes(sdev->request_queue))
> +			return NEEDS_RETRY;
>  		if (sshdr.asc == 0x20 || /* Invalid command operation code */
>  		    sshdr.asc == 0x21 || /* Logical block address out of range */
>  		    sshdr.asc == 0x22 || /* Invalid function */
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 47dafe6b8a66..cd90b54a6597 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1207,6 +1207,9 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
>  	cmd->transfersize = sdp->sector_size;
>  	cmd->underflow = nr_blocks << 9;
>  	cmd->allowed = sdkp->max_retries;
> +	if (blk_queue_pipeline_zoned_writes(rq->q) &&
> +	    blk_rq_is_seq_zone_write(rq))
> +		cmd->allowed += rq->q->nr_requests;
>  	cmd->sdb.length = nr_blocks * sdp->sector_size;
>  
>  	SCSI_LOG_HLQUEUE(1,

Aouch... The amount of IO errors logged in the the kernel log will not be
pretty, no ?

At least for scsi disks, the problem is that a write may partially fail
(the command hit a bad sector). Such write failure will cause all queued
writes after it to fail and no amount of retry will help. Not sure about
UFS devices though. Can you get a partial failure ?


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/8] block: Introduce the blk_rq_is_seq_zone_write() function
  2023-01-09 23:38   ` Damien Le Moal
@ 2023-01-09 23:52     ` Bart Van Assche
  2023-01-10  9:52       ` Niklas Cassel
  0 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2023-01-09 23:52 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe; +Cc: linux-block, Jaegeuk Kim, Avri Altman

On 1/9/23 15:38, Damien Le Moal wrote:
> On 1/10/23 08:27, Bart Van Assche wrote:
>> +static inline bool blk_rq_is_seq_zone_write(struct request *rq)
>> +{
>> +	switch (req_op(rq)) {
>> +	case REQ_OP_WRITE:
>> +	case REQ_OP_WRITE_ZEROES:
> 
> REQ_OP_ZONE_APPEND ?

I will add REQ_OP_ZONE_APPEND.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 7/8] scsi: Retry unaligned zoned writes
  2023-01-09 23:51   ` Damien Le Moal
@ 2023-01-09 23:55     ` Bart Van Assche
  0 siblings, 0 replies; 37+ messages in thread
From: Bart Van Assche @ 2023-01-09 23:55 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Martin K . Petersen

On 1/9/23 15:51, Damien Le Moal wrote:
> On 1/10/23 08:27, Bart Van Assche wrote:
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 47dafe6b8a66..cd90b54a6597 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -1207,6 +1207,9 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
>>   	cmd->transfersize = sdp->sector_size;
>>   	cmd->underflow = nr_blocks << 9;
>>   	cmd->allowed = sdkp->max_retries;
>> +	if (blk_queue_pipeline_zoned_writes(rq->q) &&
>> +	    blk_rq_is_seq_zone_write(rq))
>> +		cmd->allowed += rq->q->nr_requests;
>>   	cmd->sdb.length = nr_blocks * sdp->sector_size;
>>   
>>   	SCSI_LOG_HLQUEUE(1,
> 
> Aouch... The amount of IO errors logged in the the kernel log will not be
> pretty, no ?
> 
> At least for scsi disks, the problem is that a write may partially fail
> (the command hit a bad sector). Such write failure will cause all queued
> writes after it to fail and no amount of retry will help. Not sure about
> UFS devices though. Can you get a partial failure ?

Although the UFS specification supports residuals, one of the open bugs 
in the UFS driver is that the residual field from the UFS device 
response is ignored. I think this means that no UFS device vendor cares 
enough about partial failures to implement support for partial failures.

Please let me know if you want me to look into suppressing error 
messages triggered by this new retry mechanism.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-09 23:51     ` Bart Van Assche
@ 2023-01-09 23:56       ` Damien Le Moal
  2023-01-10  0:19         ` Bart Van Assche
  0 siblings, 1 reply; 37+ messages in thread
From: Damien Le Moal @ 2023-01-09 23:56 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Damien Le Moal

On 1/10/23 08:51, Bart Van Assche wrote:
> On 1/9/23 15:46, Damien Le Moal wrote:
>> On 1/10/23 08:27, Bart Van Assche wrote:
>>> Measurements have shown that limiting the queue depth to one for zoned
>>> writes has a significant negative performance impact on zoned UFS devices.
>>> Hence this patch that disables zone locking from the mq-deadline scheduler
>>> for storage controllers that support pipelining zoned writes. This patch is
>>> based on the following assumptions:
>>> - Applications submit write requests to sequential write required zones
>>>    in order.
>>> - It happens infrequently that zoned write requests are reordered by the
>>>    block layer.
>>> - The storage controller does not reorder write requests that have been
>>>    submitted to the same hardware queue. This is the case for UFS: the
>>>    UFSHCI specification requires that UFS controllers process requests in
>>>    order per hardware queue.
>>> - The I/O priority of all pipelined write requests is the same per zone.
>>> - Either no I/O scheduler is used or an I/O scheduler is used that
>>>    submits write requests per zone in LBA order.
>>>
>>> If applications submit write requests to sequential write required zones
>>> in order, at least one of the pending requests will succeed. Hence, the
>>> number of retries that is required is at most (number of pending
>>> requests) - 1.
>>
>> But if the mid-layer decides to requeue a write request, the workqueue
>> used in the mq block layer for requeuing is going to completely destroy
>> write ordering as that is outside of the submission path, working in
>> parallel with it... Does blk_queue_pipeline_zoned_writes() == true also
>> guarantee that a write request will *never* be requeued before hitting the
>> adapter/device ?
> 
> We don't need the guarantee that reordering will never happen. What we 
> need is that reordering happens infrequently (e.g. less than 1% of the 
> cases). This is what the last paragraph before your reply refers to. 
> Maybe I should expand that paragraph.

But my point is that if a request goes through the block layer requeue, it
will be out of order, and will be submitted out of order again, and will
fail again. Unless you stall dispatching, wait for all requeues to come
back in the scheduler, and then start trying again, I do not see how you
can guarantee that retrying the unaligned writes will ever succeed.

I am talking in the context of host-managed devices here.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-09 23:56       ` Damien Le Moal
@ 2023-01-10  0:19         ` Bart Van Assche
  2023-01-10  0:32           ` Damien Le Moal
  0 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2023-01-10  0:19 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Damien Le Moal

On 1/9/23 15:56, Damien Le Moal wrote:
> But my point is that if a request goes through the block layer requeue, it
> will be out of order, and will be submitted out of order again, and will
> fail again. Unless you stall dispatching, wait for all requeues to come
> back in the scheduler, and then start trying again, I do not see how you
> can guarantee that retrying the unaligned writes will ever succeed.
> 
> I am talking in the context of host-managed devices here.

Hi Damien,

How about changing the NEEDS_RETRY in patch 7/8 into another value than 
SUCCESS, NEEDS_RETRY or ADD_TO_MLQUEUE? That will make the SCSI core 
wait until all pending commands have finished before it starts its error 
handling strategy and before it requeues any pending commands.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-10  0:19         ` Bart Van Assche
@ 2023-01-10  0:32           ` Damien Le Moal
  2023-01-10  0:38             ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: Damien Le Moal @ 2023-01-10  0:32 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Damien Le Moal

On 1/10/23 09:19, Bart Van Assche wrote:
> On 1/9/23 15:56, Damien Le Moal wrote:
>> But my point is that if a request goes through the block layer requeue, it
>> will be out of order, and will be submitted out of order again, and will
>> fail again. Unless you stall dispatching, wait for all requeues to come
>> back in the scheduler, and then start trying again, I do not see how you
>> can guarantee that retrying the unaligned writes will ever succeed.
>>
>> I am talking in the context of host-managed devices here.
> 
> Hi Damien,
> 
> How about changing the NEEDS_RETRY in patch 7/8 into another value than 
> SUCCESS, NEEDS_RETRY or ADD_TO_MLQUEUE? That will make the SCSI core 
> wait until all pending commands have finished before it starts its error 
> handling strategy and before it requeues any pending commands.

Considering a sequence of sequential write requests, the request can be in:
1) The scheduler
2) The dispatch queue (out of the scheduler)
3) In the requeue list, waiting to be put back in the scheduler
4) in-flight being processed for dispatch by the scsi mid-layer & scsi
disk driver
5) being processed by the driver
6) dispatched to the device already

Requeue back to the scheduler can happen anywhere after (2) up to (5)
(would need to check again to be 100% sure though). So I do not see how
changes to the scsi layer only, adding a new state, can cover all possible
cases resulting at some point to come back to a clean ordering. But if you
have ideas to prove me wrong, I will be happy to review that :)

So far, the only thing that I think could work is: stall everything and
put back all write requests in the scheduler, and restart dispatching.
That will definitively have a performance impact. How does that compare to
the zone write locking performance impact, I am not sure...

It may be way simpler to rely on:
1) none scheduler
2) some light re-ordering of write requests in the driver itself to avoid
any requeue to higher level (essentially, handle requeueing in the driver
itself).

> 
> Thanks,
> 
> Bart.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-10  0:32           ` Damien Le Moal
@ 2023-01-10  0:38             ` Jens Axboe
  2023-01-10  0:41               ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2023-01-10  0:38 UTC (permalink / raw)
  To: Damien Le Moal, Bart Van Assche
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Damien Le Moal

On 1/9/23 5:32?PM, Damien Le Moal wrote:
> It may be way simpler to rely on:
> 1) none scheduler
> 2) some light re-ordering of write requests in the driver itself to avoid
> any requeue to higher level (essentially, handle requeueing in the driver
> itself).

Let's not start adding requeueing back into the drivers, that was a
giant mess that we tried to get out of for a while.

And all of this strict ordering mess really makes me think that we
should just have a special scheduler mandated for devices that have that
kind of stupid constraints...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-10  0:38             ` Jens Axboe
@ 2023-01-10  0:41               ` Jens Axboe
  2023-01-10  0:44                 ` Bart Van Assche
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2023-01-10  0:41 UTC (permalink / raw)
  To: Damien Le Moal, Bart Van Assche
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Damien Le Moal

On 1/9/23 5:38 PM, Jens Axboe wrote:
> On 1/9/23 5:32?PM, Damien Le Moal wrote:
>> It may be way simpler to rely on:
>> 1) none scheduler
>> 2) some light re-ordering of write requests in the driver itself to avoid
>> any requeue to higher level (essentially, handle requeueing in the driver
>> itself).
> 
> Let's not start adding requeueing back into the drivers, that was a
> giant mess that we tried to get out of for a while.
> 
> And all of this strict ordering mess really makes me think that we
> should just have a special scheduler mandated for devices that have that
> kind of stupid constraints...

Or, probably better, a stacked scheduler where the bottom one can be zone
away. Then we can get rid of littering the entire stack and IO schedulers
with silly blk_queue_pipeline_zoned_writes() or blk_is_zoned_write() etc.

It really is a mess right now...

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-10  0:41               ` Jens Axboe
@ 2023-01-10  0:44                 ` Bart Van Assche
  2023-01-10  0:48                   ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2023-01-10  0:44 UTC (permalink / raw)
  To: Jens Axboe, Damien Le Moal
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Damien Le Moal

On 1/9/23 16:41, Jens Axboe wrote:
> Or, probably better, a stacked scheduler where the bottom one can be zone
> away. Then we can get rid of littering the entire stack and IO schedulers
> with silly blk_queue_pipeline_zoned_writes() or blk_is_zoned_write() etc.

Hi Jens,

Isn't one of Damien's viewpoints that an I/O scheduler should not do the 
reordering of write requests since reordering of write requests may 
involve waiting for write requests, write request that will never be 
received if all tags have been allocated?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-10  0:44                 ` Bart Van Assche
@ 2023-01-10  0:48                   ` Jens Axboe
  2023-01-10  0:56                     ` Bart Van Assche
  2023-01-10  2:24                     ` Damien Le Moal
  0 siblings, 2 replies; 37+ messages in thread
From: Jens Axboe @ 2023-01-10  0:48 UTC (permalink / raw)
  To: Bart Van Assche, Damien Le Moal
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Damien Le Moal

On 1/9/23 5:44?PM, Bart Van Assche wrote:
> On 1/9/23 16:41, Jens Axboe wrote:
>> Or, probably better, a stacked scheduler where the bottom one can be zone
>> away. Then we can get rid of littering the entire stack and IO schedulers
>> with silly blk_queue_pipeline_zoned_writes() or blk_is_zoned_write() etc.
> 
> Hi Jens,
> 
> Isn't one of Damien's viewpoints that an I/O scheduler should not do
> the reordering of write requests since reordering of write requests
> may involve waiting for write requests, write request that will never
> be received if all tags have been allocated?

It should be work conservering, it should not wait for anything. If
there are holes or gaps, then there's nothing the scheduler can do.

My point is that the strict ordering was pretty hacky when it went in,
and rather than get better, it's proliferating. That's not a good
direction.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-10  0:48                   ` Jens Axboe
@ 2023-01-10  0:56                     ` Bart Van Assche
  2023-01-10  1:03                       ` Jens Axboe
  2023-01-10  2:24                     ` Damien Le Moal
  1 sibling, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2023-01-10  0:56 UTC (permalink / raw)
  To: Jens Axboe, Damien Le Moal
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Damien Le Moal

On 1/9/23 16:48, Jens Axboe wrote:
> On 1/9/23 5:44?PM, Bart Van Assche wrote:
>> On 1/9/23 16:41, Jens Axboe wrote:
>>> Or, probably better, a stacked scheduler where the bottom one can be zone
>>> away. Then we can get rid of littering the entire stack and IO schedulers
>>> with silly blk_queue_pipeline_zoned_writes() or blk_is_zoned_write() etc.
>>
>> Hi Jens,
>>
>> Isn't one of Damien's viewpoints that an I/O scheduler should not do
>> the reordering of write requests since reordering of write requests
>> may involve waiting for write requests, write request that will never
>> be received if all tags have been allocated?
> 
> It should be work conservering, it should not wait for anything. If
> there are holes or gaps, then there's nothing the scheduler can do.
> 
> My point is that the strict ordering was pretty hacky when it went in,
> and rather than get better, it's proliferating. That's not a good
> direction.

Hi Jens,

As you know one of the deeply embedded design choices in the blk-mq code 
is that reordering can happen at any time between submission of a 
request to the blk-mq code and request completion. I agree with that 
design choice.

For the use cases I'm looking at the sequential write required zone type 
works best. This zone type works best since it guarantees that data on 
the storage medium is sequential. This results in optimal sequential 
read performance.

Combining these two approaches is not ideal and I agree that the 
combination of these two approaches adds some complexity. Personally I 
prefer to add a limited amount of complexity rather than implementing a 
new block layer from scratch.

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-10  0:56                     ` Bart Van Assche
@ 2023-01-10  1:03                       ` Jens Axboe
  2023-01-10  1:17                         ` Bart Van Assche
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2023-01-10  1:03 UTC (permalink / raw)
  To: Bart Van Assche, Damien Le Moal
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Damien Le Moal

On 1/9/23 5:56?PM, Bart Van Assche wrote:
> On 1/9/23 16:48, Jens Axboe wrote:
>> On 1/9/23 5:44?PM, Bart Van Assche wrote:
>>> On 1/9/23 16:41, Jens Axboe wrote:
>>>> Or, probably better, a stacked scheduler where the bottom one can be zone
>>>> away. Then we can get rid of littering the entire stack and IO schedulers
>>>> with silly blk_queue_pipeline_zoned_writes() or blk_is_zoned_write() etc.
>>>
>>> Hi Jens,
>>>
>>> Isn't one of Damien's viewpoints that an I/O scheduler should not do
>>> the reordering of write requests since reordering of write requests
>>> may involve waiting for write requests, write request that will never
>>> be received if all tags have been allocated?
>>
>> It should be work conservering, it should not wait for anything. If
>> there are holes or gaps, then there's nothing the scheduler can do.
>>
>> My point is that the strict ordering was pretty hacky when it went in,
>> and rather than get better, it's proliferating. That's not a good
>> direction.
> 
> Hi Jens,
> 
> As you know one of the deeply embedded design choices in the blk-mq
> code is that reordering can happen at any time between submission of a
> request to the blk-mq code and request completion. I agree with that
> design choice.

Indeed. And getting rid of any ordering ops like barriers greatly
simplified things and fixed a number of issued related to that.

> For the use cases I'm looking at the sequential write required zone
> type works best. This zone type works best since it guarantees that
> data on the storage medium is sequential. This results in optimal
> sequential read performance.

That's a given.

> Combining these two approaches is not ideal and I agree that the
> combination of these two approaches adds some complexity. Personally I
> prefer to add a limited amount of complexity rather than implementing
> a new block layer from scratch.

I'm not talking about a new block layer at all, ordered devices are not
nearly important enough to warrant that kind of attention. Nor would it
be a good solution even if they were. I'm merely saying that I'm getting
more and more disgruntled with the direction that is being taken to
cater to these kinds of devices, and perhaps a much better idea is to
contain that complexity in a separate scheduler (be it stacked or not).
Because I'm really not thrilled to see the addition of various "is this
device ordered" all over the place, and now we are getting "is this
device ordered AND pipelined". Do you see what I mean? It's making
things _worse_, not better, and we really should be making it better
rather than pile more stuff on top of it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-10  1:03                       ` Jens Axboe
@ 2023-01-10  1:17                         ` Bart Van Assche
  2023-01-10  1:48                           ` Jens Axboe
  0 siblings, 1 reply; 37+ messages in thread
From: Bart Van Assche @ 2023-01-10  1:17 UTC (permalink / raw)
  To: Jens Axboe, Damien Le Moal
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Damien Le Moal

On 1/9/23 17:03, Jens Axboe wrote:
> Because I'm really not thrilled to see the addition of various "is this
> device ordered" all over the place, and now we are getting "is this
> device ordered AND pipelined". Do you see what I mean? It's making
> things _worse_, not better, and we really should be making it better
> rather than pile more stuff on top of it.

Hi Jens,

I agree with you that the additional complexity is unfortunate.

For most zoned storage use cases a queue depth above one is not an 
option if the zoned device expects zoned write commands in LBA order. 
ATA controllers do not support preserving the command order. Transports 
like NVMeOF do not support preserving the command order either. UFS is 
an exception. The only use case supported by the UFS specification is a 
1:1 connection between UFS controller and UFS device with a link with a 
low BER between controller and device. UFS controllers must preserve the 
command order per command queue. I think this context is well suited for 
pipelining zoned write commands.

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-10  1:17                         ` Bart Van Assche
@ 2023-01-10  1:48                           ` Jens Axboe
  0 siblings, 0 replies; 37+ messages in thread
From: Jens Axboe @ 2023-01-10  1:48 UTC (permalink / raw)
  To: Bart Van Assche, Damien Le Moal
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Damien Le Moal

On 1/9/23 6:17?PM, Bart Van Assche wrote:
> On 1/9/23 17:03, Jens Axboe wrote:
>> Because I'm really not thrilled to see the addition of various "is this
>> device ordered" all over the place, and now we are getting "is this
>> device ordered AND pipelined". Do you see what I mean? It's making
>> things _worse_, not better, and we really should be making it better
>> rather than pile more stuff on top of it.
> 
> Hi Jens,
> 
> I agree with you that the additional complexity is unfortunate.
> 
> For most zoned storage use cases a queue depth above one is not an
> option if the zoned device expects zoned write commands in LBA order.
> ATA controllers do not support preserving the command order.
> Transports like NVMeOF do not support preserving the command order
> either. UFS is an exception. The only use case supported by the UFS
> specification is a 1:1 connection between UFS controller and UFS
> device with a link with a low BER between controller and device. UFS
> controllers must preserve the command order per command queue. I think
> this context is well suited for pipelining zoned write commands.

But it should not matter, if the scheduler handles it, and requeues are
ordered correctly. If the queue depth isn't known at init time, surely
we'd get a retry condition on submitting a request if it can't accept
another one. That'd trigger a retry, and the retry should be the first
one submitted when the device can accept another one.

Any setup that handles queue depth > 1 will do just fine at 1 as well.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-10  0:48                   ` Jens Axboe
  2023-01-10  0:56                     ` Bart Van Assche
@ 2023-01-10  2:24                     ` Damien Le Moal
  2023-01-10  3:00                       ` Jens Axboe
  1 sibling, 1 reply; 37+ messages in thread
From: Damien Le Moal @ 2023-01-10  2:24 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Damien Le Moal

On 1/10/23 09:48, Jens Axboe wrote:
> On 1/9/23 5:44?PM, Bart Van Assche wrote:
>> On 1/9/23 16:41, Jens Axboe wrote:
>>> Or, probably better, a stacked scheduler where the bottom one can be zone
>>> away. Then we can get rid of littering the entire stack and IO schedulers
>>> with silly blk_queue_pipeline_zoned_writes() or blk_is_zoned_write() etc.
>>
>> Hi Jens,
>>
>> Isn't one of Damien's viewpoints that an I/O scheduler should not do
>> the reordering of write requests since reordering of write requests
>> may involve waiting for write requests, write request that will never
>> be received if all tags have been allocated?
> 
> It should be work conservering, it should not wait for anything. If
> there are holes or gaps, then there's nothing the scheduler can do.
> 
> My point is that the strict ordering was pretty hacky when it went in,
> and rather than get better, it's proliferating. That's not a good
> direction.

Yes, and hard to maintain/avoid breaking something.

Given that only writes need special handling, I am thinking that having a
dedicated write queue for submission/scheduling/requeue could
significantly clean things up. Essentially, we would have a different code
path for zoned device write from submit_bio(). Something like:

if (queue_is_zoned() && op_is_write())
	return blk_zoned_write_submit();

at the top of submit_bio(). That zone write code can be isolated in
block/blk-zoned.c and avoid spreading "if (zoned)" all over the place.
E.g. the flush machinery reorders writes right now... That needs fixing,
more "if (zoned)" coming...

That special zone write queue could also do its own dispatch scheduling,
so no need to hack existing schedulers.

Need to try coding something to see how it goes.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary
  2023-01-10  2:24                     ` Damien Le Moal
@ 2023-01-10  3:00                       ` Jens Axboe
  0 siblings, 0 replies; 37+ messages in thread
From: Jens Axboe @ 2023-01-10  3:00 UTC (permalink / raw)
  To: Damien Le Moal, Bart Van Assche
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Damien Le Moal

On 1/9/23 7:24?PM, Damien Le Moal wrote:
> On 1/10/23 09:48, Jens Axboe wrote:
>> On 1/9/23 5:44?PM, Bart Van Assche wrote:
>>> On 1/9/23 16:41, Jens Axboe wrote:
>>>> Or, probably better, a stacked scheduler where the bottom one can be zone
>>>> away. Then we can get rid of littering the entire stack and IO schedulers
>>>> with silly blk_queue_pipeline_zoned_writes() or blk_is_zoned_write() etc.
>>>
>>> Hi Jens,
>>>
>>> Isn't one of Damien's viewpoints that an I/O scheduler should not do
>>> the reordering of write requests since reordering of write requests
>>> may involve waiting for write requests, write request that will never
>>> be received if all tags have been allocated?
>>
>> It should be work conservering, it should not wait for anything. If
>> there are holes or gaps, then there's nothing the scheduler can do.
>>
>> My point is that the strict ordering was pretty hacky when it went in,
>> and rather than get better, it's proliferating. That's not a good
>> direction.
> 
> Yes, and hard to maintain/avoid breaking something.

Indeed! It's both fragile and ends up adding branches in a bunch of
spots in the generic code, which isn't ideal either from an efficiency
pov.

> Given that only writes need special handling, I am thinking that having a
> dedicated write queue for submission/scheduling/requeue could
> significantly clean things up. Essentially, we would have a different code
> path for zoned device write from submit_bio(). Something like:
> 
> if (queue_is_zoned() && op_is_write())
> 	return blk_zoned_write_submit();
> 
> at the top of submit_bio(). That zone write code can be isolated in
> block/blk-zoned.c and avoid spreading "if (zoned)" all over the place.
> E.g. the flush machinery reorders writes right now... That needs fixing,
> more "if (zoned)" coming...
> 
> That special zone write queue could also do its own dispatch scheduling,
> so no need to hack existing schedulers.

This seems very reasonable, and would just have the one check at queue
time, and then one at requeue time (which is fine, that's not a fast
path in any case).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH 8/8] scsi: ufs: Enable zoned write pipelining
  2023-01-09 23:27 ` [PATCH 8/8] scsi: ufs: Enable zoned write pipelining Bart Van Assche
@ 2023-01-10  9:16   ` Avri Altman
  2023-01-10 17:42     ` Bart Van Assche
  2023-01-10 12:23   ` Bean Huo
  1 sibling, 1 reply; 37+ messages in thread
From: Avri Altman @ 2023-01-10  9:16 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Martin K . Petersen, Damien Le Moal

Please include scsi lkml as well to allow ufs stakeholders to be aware and comment.

Thanks,
Avri

> From the UFSHCI 4.0 specification, about the legacy (single queue) mode:
> "The host controller always process transfer requests in-order according
> to the order submitted to the list. In case of multiple commands with
> single doorbell register ringing (batch mode), The dispatch order for
> these transfer requests by host controller will base on their index in
> the List. A transfer request with lower index value will be executed
> before a transfer request with higher index value."
> 
> From the UFSHCI 4.0 specification, about the MCQ mode:
> "Command Submission
> 1. Host SW writes an Entry to SQ
> 2. Host SW updates SQ doorbell tail pointer
> 
> Command Processing
> 3. After fetching the Entry, Host Controller updates SQ doorbell head
>    pointer
> 4. Host controller sends COMMAND UPIU to UFS device"
> 
> In other words, for both legacy and MCQ mode, UFS controllers are
> required to forward commands to the UFS device in the order these
> commands have been received from the host.
> 
> Note: for legacy mode this is only correct if the host submits one
> command at a time. The UFS driver does this.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Avri Altman <avri.altman@wdc.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/ufs/core/ufshcd.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index e18c9f4463ec..9198505e953b 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -5031,6 +5031,7 @@ static int ufshcd_slave_configure(struct scsi_device
> *sdev)
> 
>         ufshcd_hpb_configure(hba, sdev);
> 
> +       blk_queue_flag_set(QUEUE_FLAG_PIPELINE_ZONED_WRITES, q);
>         blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1);
>         if (hba->quirks & UFSHCD_QUIRK_ALIGN_SG_WITH_PAGE_SIZE)
>                 blk_queue_update_dma_alignment(q, PAGE_SIZE - 1);

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/8] block: Introduce the blk_rq_is_seq_zone_write() function
  2023-01-09 23:52     ` Bart Van Assche
@ 2023-01-10  9:52       ` Niklas Cassel
  2023-01-10 11:54         ` Damien Le Moal
  0 siblings, 1 reply; 37+ messages in thread
From: Niklas Cassel @ 2023-01-10  9:52 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Damien Le Moal, Jens Axboe, linux-block, Jaegeuk Kim, Avri Altman

On Mon, Jan 09, 2023 at 03:52:23PM -0800, Bart Van Assche wrote:
> On 1/9/23 15:38, Damien Le Moal wrote:
> > On 1/10/23 08:27, Bart Van Assche wrote:
> > > +static inline bool blk_rq_is_seq_zone_write(struct request *rq)
> > > +{
> > > +	switch (req_op(rq)) {
> > > +	case REQ_OP_WRITE:
> > > +	case REQ_OP_WRITE_ZEROES:
> > 
> > REQ_OP_ZONE_APPEND ?
> 
> I will add REQ_OP_ZONE_APPEND.
> 

Hello Bart, Damien,

+       if (blk_queue_pipeline_zoned_writes(rq->q) &&
+           blk_rq_is_seq_zone_write(rq))
+               cmd->allowed += rq->q->nr_requests;

Considering that this function, blk_rq_is_seq_zone_write(), only seems to
be used to determine if a request should be allowed to be retried, I think
that it is incorrect to add REQ_OP_ZONE_APPEND, since a zone append
operation will never result in a ILLEGAL REQUEST/UNALIGNED WRITE COMMAND.

(If this instead was a function that said which operations that needed to
be held back, then you would probably need to include REQ_OP_ZONE_APPEND,
as otherwise the reordered+retried write would never be able to succeed.)


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/8] block: Introduce the blk_rq_is_seq_zone_write() function
  2023-01-10  9:52       ` Niklas Cassel
@ 2023-01-10 11:54         ` Damien Le Moal
  2023-01-10 12:13           ` Niklas Cassel
  0 siblings, 1 reply; 37+ messages in thread
From: Damien Le Moal @ 2023-01-10 11:54 UTC (permalink / raw)
  To: Niklas Cassel, Bart Van Assche
  Cc: Jens Axboe, linux-block, Jaegeuk Kim, Avri Altman

On 1/10/23 18:52, Niklas Cassel wrote:
> On Mon, Jan 09, 2023 at 03:52:23PM -0800, Bart Van Assche wrote:
>> On 1/9/23 15:38, Damien Le Moal wrote:
>>> On 1/10/23 08:27, Bart Van Assche wrote:
>>>> +static inline bool blk_rq_is_seq_zone_write(struct request *rq)
>>>> +{
>>>> +	switch (req_op(rq)) {
>>>> +	case REQ_OP_WRITE:
>>>> +	case REQ_OP_WRITE_ZEROES:
>>>
>>> REQ_OP_ZONE_APPEND ?
>>
>> I will add REQ_OP_ZONE_APPEND.
>>
> 
> Hello Bart, Damien,
> 
> +       if (blk_queue_pipeline_zoned_writes(rq->q) &&
> +           blk_rq_is_seq_zone_write(rq))
> +               cmd->allowed += rq->q->nr_requests;
> 
> Considering that this function, blk_rq_is_seq_zone_write(), only seems to
> be used to determine if a request should be allowed to be retried, I think
> that it is incorrect to add REQ_OP_ZONE_APPEND, since a zone append
> operation will never result in a ILLEGAL REQUEST/UNALIGNED WRITE COMMAND.
> 
> (If this instead was a function that said which operations that needed to
> be held back, then you would probably need to include REQ_OP_ZONE_APPEND,
> as otherwise the reordered+retried write would never be able to succeed.)

Unless UFS defines a zone append operation, REQ_OP_ZONE_APPEND will be
processed using regular writes in the sd driver.

I suggested adding it given the name of the function. REQ_OP_ZONE_APPEND
is a zone write operation for sequential zones...


> 
> 
> Kind regards,
> Niklas

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/8] block: Introduce the blk_rq_is_seq_zone_write() function
  2023-01-10 11:54         ` Damien Le Moal
@ 2023-01-10 12:13           ` Niklas Cassel
  2023-01-10 12:41             ` Damien Le Moal
  0 siblings, 1 reply; 37+ messages in thread
From: Niklas Cassel @ 2023-01-10 12:13 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Bart Van Assche, Jens Axboe, linux-block, Jaegeuk Kim, Avri Altman

On Tue, Jan 10, 2023 at 08:54:24PM +0900, Damien Le Moal wrote:
> On 1/10/23 18:52, Niklas Cassel wrote:
> > On Mon, Jan 09, 2023 at 03:52:23PM -0800, Bart Van Assche wrote:
> >> On 1/9/23 15:38, Damien Le Moal wrote:
> >>> On 1/10/23 08:27, Bart Van Assche wrote:
> >>>> +static inline bool blk_rq_is_seq_zone_write(struct request *rq)
> >>>> +{
> >>>> +	switch (req_op(rq)) {
> >>>> +	case REQ_OP_WRITE:
> >>>> +	case REQ_OP_WRITE_ZEROES:
> >>>
> >>> REQ_OP_ZONE_APPEND ?
> >>
> >> I will add REQ_OP_ZONE_APPEND.
> >>
> > 
> > Hello Bart, Damien,
> > 
> > +       if (blk_queue_pipeline_zoned_writes(rq->q) &&
> > +           blk_rq_is_seq_zone_write(rq))
> > +               cmd->allowed += rq->q->nr_requests;
> > 
> > Considering that this function, blk_rq_is_seq_zone_write(), only seems to
> > be used to determine if a request should be allowed to be retried, I think
> > that it is incorrect to add REQ_OP_ZONE_APPEND, since a zone append
> > operation will never result in a ILLEGAL REQUEST/UNALIGNED WRITE COMMAND.
> > 
> > (If this instead was a function that said which operations that needed to
> > be held back, then you would probably need to include REQ_OP_ZONE_APPEND,
> > as otherwise the reordered+retried write would never be able to succeed.)
> 
> Unless UFS defines a zone append operation, REQ_OP_ZONE_APPEND will be
> processed using regular writes in the sd driver.

Sure, but I still think that my point is valid.

A REQ_OP_ZONE_APPEND should never be able to result in a
"UNALIGNED WRITE COMMAND".

If the SCSI REQ_OP_ZONE_APPEND emulation can result in a
"UNALIGNED WRITE COMMAND", I would argue that the SCSI zone append
emulation is faulty.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 8/8] scsi: ufs: Enable zoned write pipelining
  2023-01-09 23:27 ` [PATCH 8/8] scsi: ufs: Enable zoned write pipelining Bart Van Assche
  2023-01-10  9:16   ` Avri Altman
@ 2023-01-10 12:23   ` Bean Huo
  2023-01-10 17:41     ` Bart Van Assche
  1 sibling, 1 reply; 37+ messages in thread
From: Bean Huo @ 2023-01-10 12:23 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Martin K . Petersen,
	Damien Le Moal

Bart,

So few changes to the UFS and SCSI layers! Does this require a VPD probe 
to check if the UFS

device is a Zoned block device and Zone size?

Kind regards,

Bean

On 10.01.23 12:27 AM, Bart Van Assche wrote:
>  From the UFSHCI 4.0 specification, about the legacy (single queue) mode:
> "The host controller always process transfer requests in-order according
> to the order submitted to the list. In case of multiple commands with
> single doorbell register ringing (batch mode), The dispatch order for
> these transfer requests by host controller will base on their index in
> the List. A transfer request with lower index value will be executed
> before a transfer request with higher index value."
>
>  From the UFSHCI 4.0 specification, about the MCQ mode:
> "Command Submission
> 1. Host SW writes an Entry to SQ
> 2. Host SW updates SQ doorbell tail pointer
>
> Command Processing
> 3. After fetching the Entry, Host Controller updates SQ doorbell head
>     pointer
> 4. Host controller sends COMMAND UPIU to UFS device"
>
> In other words, for both legacy and MCQ mode, UFS controllers are
> required to forward commands to the UFS device in the order these
> commands have been received from the host.
>
> Note: for legacy mode this is only correct if the host submits one
> command at a time. The UFS driver does this.
>
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Avri Altman <avri.altman@wdc.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/ufs/core/ufshcd.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index e18c9f4463ec..9198505e953b 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -5031,6 +5031,7 @@ static int ufshcd_slave_configure(struct scsi_device *sdev)
>   
>   	ufshcd_hpb_configure(hba, sdev);
>   
> +	blk_queue_flag_set(QUEUE_FLAG_PIPELINE_ZONED_WRITES, q);
>   	blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1);
>   	if (hba->quirks & UFSHCD_QUIRK_ALIGN_SG_WITH_PAGE_SIZE)
>   		blk_queue_update_dma_alignment(q, PAGE_SIZE - 1);

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/8] block: Introduce the blk_rq_is_seq_zone_write() function
  2023-01-10 12:13           ` Niklas Cassel
@ 2023-01-10 12:41             ` Damien Le Moal
  0 siblings, 0 replies; 37+ messages in thread
From: Damien Le Moal @ 2023-01-10 12:41 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Bart Van Assche, Jens Axboe, linux-block, Jaegeuk Kim, Avri Altman

On 1/10/23 21:13, Niklas Cassel wrote:
> On Tue, Jan 10, 2023 at 08:54:24PM +0900, Damien Le Moal wrote:
>> On 1/10/23 18:52, Niklas Cassel wrote:
>>> On Mon, Jan 09, 2023 at 03:52:23PM -0800, Bart Van Assche wrote:
>>>> On 1/9/23 15:38, Damien Le Moal wrote:
>>>>> On 1/10/23 08:27, Bart Van Assche wrote:
>>>>>> +static inline bool blk_rq_is_seq_zone_write(struct request *rq)
>>>>>> +{
>>>>>> +	switch (req_op(rq)) {
>>>>>> +	case REQ_OP_WRITE:
>>>>>> +	case REQ_OP_WRITE_ZEROES:
>>>>>
>>>>> REQ_OP_ZONE_APPEND ?
>>>>
>>>> I will add REQ_OP_ZONE_APPEND.
>>>>
>>>
>>> Hello Bart, Damien,
>>>
>>> +       if (blk_queue_pipeline_zoned_writes(rq->q) &&
>>> +           blk_rq_is_seq_zone_write(rq))
>>> +               cmd->allowed += rq->q->nr_requests;
>>>
>>> Considering that this function, blk_rq_is_seq_zone_write(), only seems to
>>> be used to determine if a request should be allowed to be retried, I think
>>> that it is incorrect to add REQ_OP_ZONE_APPEND, since a zone append
>>> operation will never result in a ILLEGAL REQUEST/UNALIGNED WRITE COMMAND.
>>>
>>> (If this instead was a function that said which operations that needed to
>>> be held back, then you would probably need to include REQ_OP_ZONE_APPEND,
>>> as otherwise the reordered+retried write would never be able to succeed.)
>>
>> Unless UFS defines a zone append operation, REQ_OP_ZONE_APPEND will be
>> processed using regular writes in the sd driver.
> 
> Sure, but I still think that my point is valid.
> 
> A REQ_OP_ZONE_APPEND should never be able to result in a
> "UNALIGNED WRITE COMMAND".

Yes, but that semantic should not be associated with a function named
blk_rq_is_seq_zone_write() :)

> 
> If the SCSI REQ_OP_ZONE_APPEND emulation can result in a
> "UNALIGNED WRITE COMMAND", I would argue that the SCSI zone append
> emulation is faulty.

or a passthrough command was used and screwed up the zone write pointer
tracking....

> 
> 
> Kind regards,
> Niklas

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 8/8] scsi: ufs: Enable zoned write pipelining
  2023-01-10 12:23   ` Bean Huo
@ 2023-01-10 17:41     ` Bart Van Assche
  0 siblings, 0 replies; 37+ messages in thread
From: Bart Van Assche @ 2023-01-10 17:41 UTC (permalink / raw)
  To: Bean Huo, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Avri Altman, Martin K . Petersen,
	Damien Le Moal

On 1/10/23 04:23, Bean Huo wrote:
> So few changes to the UFS and SCSI layers! Does this require a VPD probe 
> to check if the UFS device is a Zoned block device and Zone size?

Hi Bean,

I think it is safe to set QUEUE_FLAG_PIPELINE_ZONED_WRITES for any UFS 
logical unit - zoned or not. The changes in the SCSI disk (sd) driver 
are such that the number of retries is only adjusted for zoned logical 
units. Only zoned logical units should report the "UNALIGNED WRITE 
COMMAND" sense code.

As you may have noticed, Damien Le Moal plans to implement a new 
approach - a single queue per zoned logical unit for all write commands.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 8/8] scsi: ufs: Enable zoned write pipelining
  2023-01-10  9:16   ` Avri Altman
@ 2023-01-10 17:42     ` Bart Van Assche
  0 siblings, 0 replies; 37+ messages in thread
From: Bart Van Assche @ 2023-01-10 17:42 UTC (permalink / raw)
  To: Avri Altman, Jens Axboe
  Cc: linux-block, Jaegeuk Kim, Martin K . Petersen, Damien Le Moal

On 1/10/23 01:16, Avri Altman wrote:
> Please include scsi lkml as well to allow ufs stakeholders to be aware and comment.

Hi Avri,

I will do that when reposting this patch series.

As you may have noticed, Damien Le Moal plans to implement a new 
approach - a single queue per zoned logical unit for all write commands. 
I will wait with reworking this patch series until that approach has 
been implemented.

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2023-01-10 17:42 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-09 23:27 [PATCH 0/8] Enable zoned write pipelining for UFS devices Bart Van Assche
2023-01-09 23:27 ` [PATCH 1/8] block: Document blk_queue_zone_is_seq() and blk_rq_zone_is_seq() Bart Van Assche
2023-01-09 23:36   ` Damien Le Moal
2023-01-09 23:27 ` [PATCH 2/8] block: Introduce the blk_rq_is_seq_zone_write() function Bart Van Assche
2023-01-09 23:38   ` Damien Le Moal
2023-01-09 23:52     ` Bart Van Assche
2023-01-10  9:52       ` Niklas Cassel
2023-01-10 11:54         ` Damien Le Moal
2023-01-10 12:13           ` Niklas Cassel
2023-01-10 12:41             ` Damien Le Moal
2023-01-09 23:27 ` [PATCH 3/8] block: Introduce a request queue flag for pipelining zoned writes Bart Van Assche
2023-01-09 23:27 ` [PATCH 4/8] block/mq-deadline: Only use zone locking if necessary Bart Van Assche
2023-01-09 23:46   ` Damien Le Moal
2023-01-09 23:51     ` Bart Van Assche
2023-01-09 23:56       ` Damien Le Moal
2023-01-10  0:19         ` Bart Van Assche
2023-01-10  0:32           ` Damien Le Moal
2023-01-10  0:38             ` Jens Axboe
2023-01-10  0:41               ` Jens Axboe
2023-01-10  0:44                 ` Bart Van Assche
2023-01-10  0:48                   ` Jens Axboe
2023-01-10  0:56                     ` Bart Van Assche
2023-01-10  1:03                       ` Jens Axboe
2023-01-10  1:17                         ` Bart Van Assche
2023-01-10  1:48                           ` Jens Axboe
2023-01-10  2:24                     ` Damien Le Moal
2023-01-10  3:00                       ` Jens Axboe
2023-01-09 23:27 ` [PATCH 5/8] block/null_blk: Refactor null_queue_rq() Bart Van Assche
2023-01-09 23:27 ` [PATCH 6/8] block/null_blk: Add support for pipelining zoned writes Bart Van Assche
2023-01-09 23:27 ` [PATCH 7/8] scsi: Retry unaligned " Bart Van Assche
2023-01-09 23:51   ` Damien Le Moal
2023-01-09 23:55     ` Bart Van Assche
2023-01-09 23:27 ` [PATCH 8/8] scsi: ufs: Enable zoned write pipelining Bart Van Assche
2023-01-10  9:16   ` Avri Altman
2023-01-10 17:42     ` Bart Van Assche
2023-01-10 12:23   ` Bean Huo
2023-01-10 17:41     ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.