All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Andreas Hindborg <andreas.hindborg@wdc.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>,
	Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, ming.lei@redhat.com
Subject: Re: Reordering of ublk IO requests
Date: Fri, 18 Nov 2022 19:28:48 +0800	[thread overview]
Message-ID: <Y3dscKle5oqLjSNT@T590> (raw)
In-Reply-To: <8735ag8ueg.fsf@wdc.com>

On Fri, Nov 18, 2022 at 10:41:31AM +0100, Andreas Hindborg wrote:
> 
> Ming Lei <ming.lei@redhat.com> writes:
> 
> > CAUTION: This email originated from outside of Western Digital. Do not click on
> > links or open attachments unless you recognize the sender and know that the
> > content is safe.
> >
> >
> > On Fri, Nov 18, 2022 at 01:35:29PM +0900, Damien Le Moal wrote:
> >> On 11/18/22 13:12, Ming Lei wrote:
> >> [...]
> >> >>> You can only assign it to zoned write request, but you still have to check
> >> >>> the sequence inside each zone, right? Then why not just check LBAs in
> >> >>> each zone simply?
> >> >>
> >> >> We would need to know the zone map, which is not otherwise required.
> >> >> Then we would need to track the write pointer for each open zone for
> >> >> each queue, so that we can stall writes that are not issued at the write
> >> >> pointer. This is in effect all zones, because we cannot track when zones
> >> >> are implicitly closed. Then, if different queues are issuing writes to
> >> >
> >> > Can you explain "implicitly closed" state a bit?
> >> >
> >> > From https://zonedstorage.io/docs/introduction/zoned-storage, only the
> >> > following words are mentioned about closed state:
> >> >
> >> >     ```Conversely, implicitly or explicitly opened zoned can be transitioned to the
> >> >     closed state using the CLOSE ZONE command.```
> >>
> >> When a write is issued to an empty or closed zone, the drive will
> >> automatically transition the zone into the implicit open state. This is
> >> called implicit open because the host did not (explicitly) issue an open
> >> zone command.
> >>
> >> When there are too many implicitly open zones, the drive may choose to
> >> close one of the implicitly opened zone to implicitly open the zone that
> >> is a target for a write command.
> >>
> >> Simple in a nutshell. This is done so that the drive can work with a
> >> limited set of resources needed to handle open zones, that is, zones that
> >> are being written. There are some more nasty details to all this with
> >> limits on the number of open zones and active zones that a zoned drive may
> >> have.
> >
> > OK, thanks for the clarification about implicitly closed, but I
> > understand this close can't change the zone's write pointer.
> 
> You are right, it does not matter if the zone is implicitly closed, I
> was mistaken. But we still have to track the write pointer of every zone
> in open or active state, otherwise we cannot know if a write that arrive
> to a zone with no outstanding IO is actually at the write pointer, or
> whether we need to hold it.
> 
> >
> >>
> >> >
> >> > zone info can be cached in the mapping(hash table)(zone sector is the key, and zone
> >> > info is the value), which can be implemented as one LRU style. If any zone
> >> > info isn't hit in the mapping table, ioctl(BLKREPORTZONE) can be called for
> >> > obtaining the zone info.
> >> >
> >> >> the same zone, we need to sync across queues. Userspace may have
> >> >> synchronization in place to issue writes with multiple threads while
> >> >> still hitting the write pointer.
> >> >
> >> > You can trust mq-dealine, which guaranteed that write IO is sent to ->queue_rq()
> >> > in order, no matter MQ or SQ.
> >> >
> >> > Yes, it could be issue from multiple queues for ublksrv, which doesn't sync
> >> > among multiple queues.
> >> >
> >> > But per-zone re-order still can solve the issue, just need one lock
> >> > for each zone to cover the MQ re-order.
> >>
> >> That lock is already there and using it, mq-deadline will never dispatch
> >> more than one write per zone at any time. This is to avoid write
> >> reordering. So multi queue or not, for any zone, there is no possibility
> >> of having writes reordered.
> >
> > oops, I miss the single queue depth point per zone, so ublk won't break
> > zoned write at all, and I agree order of batch IOs is one problem, but
> > not hard to solve.
> 
> The current implementation _does_ break zoned write because it reverses
> batched writes. But if it is an easy fix, that is cool :)

Please look at Damien's comment:

>> That lock is already there and using it, mq-deadline will never dispatch
>> more than one write per zone at any time. This is to avoid write
>> reordering. So multi queue or not, for any zone, there is no possibility
>> of having writes reordered.

For zoned write, mq-deadline is used to limit at most one inflight write
for each zone.

So can you explain a bit how the current implementation breaks zoned
write?


Thanks, 
Ming


  reply	other threads:[~2022-11-18 11:29 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-16 15:00 Reordering of ublk IO requests Andreas Hindborg
2022-11-17  2:18 ` Ming Lei
2022-11-17  8:05   ` Andreas Hindborg
2022-11-17  8:52     ` Ming Lei
2022-11-17  9:07       ` Andreas Hindborg
2022-11-17 11:47         ` Ming Lei
2022-11-17 11:59           ` Andreas Hindborg
2022-11-17 13:11             ` Damien Le Moal
2022-11-17 13:31               ` Andreas Hindborg
2022-11-18  1:51                 ` Damien Le Moal
2022-11-18  9:29                   ` Andreas Hindborg
2022-11-18  4:12             ` Ming Lei
2022-11-18  4:35               ` Damien Le Moal
2022-11-18  6:07                 ` Ming Lei
2022-11-18  9:41                   ` Andreas Hindborg
2022-11-18 11:28                     ` Ming Lei [this message]
2022-11-18 11:49                       ` Andreas Hindborg
2022-11-18 12:46                         ` Andreas Hindborg
2022-11-18 12:47                         ` Ming Lei
2022-11-19  0:24                           ` Damien Le Moal
2022-11-19  7:36                             ` Andreas Hindborg
2022-11-21 10:15                               ` Andreas Hindborg
2022-11-20 14:37                             ` Ming Lei
2022-11-21  1:25                               ` Damien Le Moal
2022-11-21  8:03                             ` Christoph Hellwig
2022-11-21  8:13                               ` Ming Lei
2022-11-17 13:00         ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y3dscKle5oqLjSNT@T590 \
    --to=ming.lei@redhat.com \
    --cc=andreas.hindborg@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.