All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Hindborg <andreas.hindborg@wdc.com>
To: Andreas Hindborg <andreas.hindborg@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>,
	Damien Le Moal <damien.lemoal@opensource.wdc.com>,
	Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org
Subject: Re: Reordering of ublk IO requests
Date: Fri, 18 Nov 2022 13:46:48 +0100	[thread overview]
Message-ID: <87mt8o77hc.fsf@wdc.com> (raw)
In-Reply-To: <87v8nc79cv.fsf@wdc.com>


Andreas Hindborg <andreas.hindborg@wdc.com> writes:

[...]
>>> >
>>> > oops, I miss the single queue depth point per zone, so ublk won't break
>>> > zoned write at all, and I agree order of batch IOs is one problem, but
>>> > not hard to solve.
>>>
>>> The current implementation _does_ break zoned write because it reverses
>>> batched writes. But if it is an easy fix, that is cool :)
>>
>> Please look at Damien's comment:
>>
>>>> That lock is already there and using it, mq-deadline will never dispatch
>>>> more than one write per zone at any time. This is to avoid write
>>>> reordering. So multi queue or not, for any zone, there is no possibility
>>>> of having writes reordered.
>>
>> For zoned write, mq-deadline is used to limit at most one inflight write
>> for each zone.
>>
>> So can you explain a bit how the current implementation breaks zoned
>> write?
>
> Like Damien wrote in another email, mq-deadline will only impose
> ordering for requests submitted in batch. The flow we have is the
> following:
>
>  - Userspace sends requests to ublk gendisk
>  - Requests go through block layer and is _not_ reordered when using
>    mq-deadline. They may be split.
>  - Requests hit ublk_drv and ublk_drv will reverse order of _all_
>    batched up requests (including split requests).
>  - ublk_drv sends request to ublksrv in _reverse_ order.
>  - ublksrv sends requests _not_ batched up to target device.
>  - Requests that enter mq-deadline at the same time are reordered in LBA
>    order, that is all good.
>  - Requests that enter the kernel in different batches are not reordered
>    in LBA order and end up missing the write pointer. This is bad.
>
> So, ublk_drv is not functional for zoned storage as is. Either we have
> to fix up the ordering in userspace in ublksrv, and that _will_ have a
> performance impact. Or we fix the bug in ublk_drv that causes batched
> requests to be _reversed_.

Here is a suggestion for a fix. It needs work, but it illustrates the
idea.

From 48f54a2a83daf19dda3c928e6518ce4a3e443fcd Mon Sep 17 00:00:00 2001
From: Andreas Hindborg <andreas.hindborg@wdc.com>
Date: Fri, 18 Nov 2022 13:44:45 +0100
Subject: [PATCH] wip: Do not reorder requests in ublk

---
 drivers/block/ublk_drv.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 6a4a94b4cdf4..4fb5ccd01202 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -9,6 +9,7 @@
  *
  * (part of code stolen from loop.c)
  */
+#include <linux/llist.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/sched.h>
@@ -55,6 +56,7 @@
 #define UBLK_PARAM_TYPE_ALL (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD)
 
 struct ublk_rq_data {
+	struct llist_node llnode;
 	struct callback_head work;
 };
 
@@ -121,6 +123,7 @@ struct ublk_queue {
 	unsigned int max_io_sz;
 	bool abort_work_pending;
 	unsigned short nr_io_ready;	/* how many ios setup */
+	struct llist_head pdu_queue;
 	struct ublk_device *dev;
 	struct ublk_io ios[0];
 };
@@ -724,8 +727,15 @@ static void ublk_rq_task_work_fn(struct callback_head *work)
 	struct ublk_rq_data *data = container_of(work,
 			struct ublk_rq_data, work);
 	struct request *req = blk_mq_rq_from_pdu(data);
+	struct ublk_queue *ubq = req->mq_hctx->driver_data;
 
-	__ublk_rq_task_work(req);
+	/* Some times this list is empty, but that is OK */
+	struct llist_node *head = llist_del_all(&ubq->pdu_queue);
+	head = llist_reverse_order(head);
+	llist_for_each_entry(data, head, llnode) {
+		req = blk_mq_rq_from_pdu(data);
+		__ublk_rq_task_work(req);
+	}
 }
 
 static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
@@ -753,6 +763,7 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
 		enum task_work_notify_mode notify_mode = bd->last ?
 			TWA_SIGNAL_NO_IPI : TWA_NONE;
 
+		llist_add(&data->llnode, &ubq->pdu_queue);
 		if (task_work_add(ubq->ubq_daemon, &data->work, notify_mode))
 			goto fail;
 	} else {
@@ -1170,6 +1181,9 @@ static int ublk_init_queue(struct ublk_device *ub, int q_id)
 
 	ubq->io_cmd_buf = ptr;
 	ubq->dev = ub;
+
+	init_llist_head(&ubq->pdu_queue);
+
 	return 0;
 }
 
-- 
2.38.1



  reply	other threads:[~2022-11-18 12:49 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-16 15:00 Reordering of ublk IO requests Andreas Hindborg
2022-11-17  2:18 ` Ming Lei
2022-11-17  8:05   ` Andreas Hindborg
2022-11-17  8:52     ` Ming Lei
2022-11-17  9:07       ` Andreas Hindborg
2022-11-17 11:47         ` Ming Lei
2022-11-17 11:59           ` Andreas Hindborg
2022-11-17 13:11             ` Damien Le Moal
2022-11-17 13:31               ` Andreas Hindborg
2022-11-18  1:51                 ` Damien Le Moal
2022-11-18  9:29                   ` Andreas Hindborg
2022-11-18  4:12             ` Ming Lei
2022-11-18  4:35               ` Damien Le Moal
2022-11-18  6:07                 ` Ming Lei
2022-11-18  9:41                   ` Andreas Hindborg
2022-11-18 11:28                     ` Ming Lei
2022-11-18 11:49                       ` Andreas Hindborg
2022-11-18 12:46                         ` Andreas Hindborg [this message]
2022-11-18 12:47                         ` Ming Lei
2022-11-19  0:24                           ` Damien Le Moal
2022-11-19  7:36                             ` Andreas Hindborg
2022-11-21 10:15                               ` Andreas Hindborg
2022-11-20 14:37                             ` Ming Lei
2022-11-21  1:25                               ` Damien Le Moal
2022-11-21  8:03                             ` Christoph Hellwig
2022-11-21  8:13                               ` Ming Lei
2022-11-17 13:00         ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mt8o77hc.fsf@wdc.com \
    --to=andreas.hindborg@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.