All of lore.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: Kanchan Joshi <joshiiitr@gmail.com>
Cc: "hch@infradead.org" <hch@infradead.org>,
	Jens Axboe <axboe@kernel.dk>,
	Pavel Begunkov <asml.silence@gmail.com>,
	Kanchan Joshi <joshi.k@samsung.com>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"bcrl@kvack.org" <bcrl@kvack.org>,
	Matthew Wilcox <willy@infradead.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-aio@kvack.org" <linux-aio@kvack.org>,
	"io-uring@vger.kernel.org" <io-uring@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
	SelvaKumar S <selvakuma.s1@samsung.com>,
	Nitesh Shetty <nj.shetty@samsung.com>,
	Javier Gonzalez <javier.gonz@samsung.com>,
	Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Naohiro Aota <Naohiro.Aota@wdc.com>
Subject: Re: [PATCH v4 6/6] io_uring: add support for zone-append
Date: Tue, 29 Sep 2020 01:24:02 +0000	[thread overview]
Message-ID: <CY4PR04MB3751BFF86D1F7F1D22A143E6E7320@CY4PR04MB3751.namprd04.prod.outlook.com> (raw)
In-Reply-To: CA+1E3rJANOsPOzjtJHSViVMq+Uc-sB0iZoExxBG++v2ghaL4uA@mail.gmail.com

On 2020/09/29 3:58, Kanchan Joshi wrote:
[...]
> ZoneFS is better when it is about dealing at single-zone granularity,
> and direct-block seems better when it is about grouping zones (in
> various ways including striping). The latter case (i.e. grouping
> zones) requires more involved mapping, and I agree that it can be left
> to application (for both ZoneFS and raw-block backends).
> But when an application tries that on ZoneFS, apart from mapping there
> would be additional cost of indirection/fd-management (due to
> file-on-files).

There is no indirection in zonefs. fd-to-struct file/inode conversion is very
fast and happens for every system call anyway, regardless of what the fd
represents. So I really do not understand what your worry is here. If you are
worried about overhead/performance, then please show numbers. If something is
wrong, we can work on fixing it.

> And if new features (zone-append for now) are available only on
> ZoneFS, it forces application to use something that maynot be most
> optimal for its need.

"may" is not enough to convince me...

> Coming to the original problem of plumbing append - I think divergence
> started because RWF_APPEND did not have any meaning for block device.
> Did I miss any other reason?

Correct.

> How about write-anywhere semantics (RWF_RELAXED_WRITE or
> RWF_ANONYMOUS_WRITE flag) on block-dev.

"write-anywhere" ? What do you mean ? That is not possible on zoned devices,
even with zone append, since you at least need to guarantee that zones have
enough unwritten space to accept an append command.

> Zone-append works a lot like write-anywhere on block-dev (or on any
> other file that combines multiple-zones, in non-sequential fashion).

That is an over-simplification that is not helpful at all. Zone append is not
"write anywhere" at all. And "write anywhere" is not a concept that exist on
regular block devices anyway. Writes only go to the offset that the user
decided, through lseek(), pwrite() or aio->aio_offset. It is not like the block
layer decides where the writes land. The same constraint applies to zone append:
the user decide the target zone. That is not "anywhere". Please be precise with
wording and implied/desired semantic. Narrow down the scope of your concept
names for clarity.

And talking about "file that combines multiple-zones" would mean that we are now
back in FS land, not raw block device file accesses anymore. So which one are we
talking about ? It looks like you are confusing what the application does and
how it uses whatever usable interface to the device with what that interface
actually is. It is very confusing.

>>> Also it seems difficult (compared to block dev) to fit simple-copy TP
>>> in ZoneFS. The new
>>> command needs: one NVMe drive, list of source LBAs and one destination
>>> LBA. In ZoneFS, we would deal with N+1 file-descriptors (N source zone
>>> file, and one destination zone file) for that. While with block
>>> interface, we do not need  more than one file-descriptor representing
>>> the entire device. With more zone-files, we face open/close overhead too.
>>
>> Are you expecting simple-copy to allow requests that are not zone aligned ? I do
>> not think that will ever happen. Otherwise, the gotcha cases for it would be far
>> too numerous. Simple-copy is essentially an optimized regular write command.
>> Similarly to that command, it will not allow copies over zone boundaries and
>> will need the destination LBA to be aligned to the destination zone WP. I have
>> not checked the TP though and given the NVMe NDA, I will stop the discussion here.
> 
> TP is ratified, if that is the problem you are referring to.

Ah. Yes. Got confused with ZRWA. Simple-copy is a different story anyway. Let's
not mix it into zone append user interface please.

> 
>> filesend() could be used as the interface for simple-copy. Implementing that in
>> zonefs would not be that hard. What is your plan for simple-copy interface for
>> raw block device ? An  ioctl ? filesend() too ? As as with any other user level
>> API, we should not be restricted to a particular device type if we can avoid it,
>> so in-kernel emulation of the feature is needed for devices that do not have
>> simple-copy or scsi extended copy. filesend() seems to me like the best choice
>> since all of that is already implemented there.
> 
> At this moment, ioctl as sync and io-uring for async. sendfile() and
> copy_file_range() takes two fds....with that we can represent copy
> from one source zone to another zone.
> But it does not fit to represent larger copy (from N source zones to
> one destination zone).

nvme passthrough ? If that does not fit your use case, then think of an
interface, its definition/semantic and propose it. But again, use a different
thread. This is mixing up zone-append and simple copy, which I do not think are
directly related.

> Not sure if I am clear, perhaps sending RFC would be better for
> discussion on simple-copy.

Separate this discussion from zone append please. Mixing up 2 problems in one
thread is not helpful to make progress.


-- 
Damien Le Moal
Western Digital Research

  reply	other threads:[~2020-09-29  1:24 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20200724155244epcas5p2902f57e36e490ee8772da19aa9408cdc@epcas5p2.samsung.com>
2020-07-24 15:49 ` [PATCH v4 0/6] zone-append support in io-uring and aio Kanchan Joshi
     [not found]   ` <CGME20200724155258epcas5p1a75b926950a18cd1e6c8e7a047e6c589@epcas5p1.samsung.com>
2020-07-24 15:49     ` [PATCH v4 1/6] fs: introduce FMODE_ZONE_APPEND and IOCB_ZONE_APPEND Kanchan Joshi
2020-07-24 16:34       ` Jens Axboe
2020-07-26 15:18       ` Christoph Hellwig
2020-07-28  1:49         ` Matthew Wilcox
2020-07-28  7:26           ` Christoph Hellwig
     [not found]   ` <CGME20200724155324epcas5p18e1d3b4402d1e4a8eca87d0b56a3fa9b@epcas5p1.samsung.com>
2020-07-24 15:49     ` [PATCH v4 2/6] fs: change ki_complete interface to support 64bit ret2 Kanchan Joshi
2020-07-26 15:18       ` Christoph Hellwig
     [not found]   ` <CGME20200724155329epcas5p345ba6bad0b8fe18056bb4bcd26c10019@epcas5p3.samsung.com>
2020-07-24 15:49     ` [PATCH v4 3/6] uio: return status with iov truncation Kanchan Joshi
     [not found]   ` <CGME20200724155341epcas5p15bfc55927f2abb60f19784270fe8e377@epcas5p1.samsung.com>
2020-07-24 15:49     ` [PATCH v4 4/6] block: add zone append handling for direct I/O path Kanchan Joshi
2020-07-26 15:19       ` Christoph Hellwig
     [not found]   ` <CGME20200724155346epcas5p2cfb383fe9904a45280c6145f4c13e1b4@epcas5p2.samsung.com>
2020-07-24 15:49     ` [PATCH v4 5/6] block: enable zone-append for iov_iter of bvec type Kanchan Joshi
2020-07-26 15:20       ` Christoph Hellwig
     [not found]   ` <CGME20200724155350epcas5p3b8f1d59eda7f8fbb38c828f692d42fd6@epcas5p3.samsung.com>
2020-07-24 15:49     ` [PATCH v4 6/6] io_uring: add support for zone-append Kanchan Joshi
2020-07-24 16:29       ` Jens Axboe
2020-07-27 19:16         ` Kanchan Joshi
2020-07-27 20:34           ` Jens Axboe
2020-07-30 16:08             ` Pavel Begunkov
2020-07-30 16:13               ` Jens Axboe
2020-07-30 16:26                 ` Pavel Begunkov
2020-07-30 17:16                   ` Jens Axboe
2020-07-30 17:38                     ` Pavel Begunkov
2020-07-30 17:51                       ` Kanchan Joshi
2020-07-30 17:54                         ` Jens Axboe
2020-07-30 18:25                           ` Kanchan Joshi
2020-07-31  6:42                             ` Damien Le Moal
2020-07-31  6:45                               ` hch
2020-07-31  6:59                                 ` Damien Le Moal
2020-07-31  7:58                                   ` Kanchan Joshi
2020-07-31  8:14                                     ` Damien Le Moal
2020-07-31  9:14                                       ` hch
2020-07-31  9:34                                         ` Damien Le Moal
2020-07-31  9:41                                           ` hch
2020-07-31 10:16                                             ` Damien Le Moal
2020-07-31 12:51                                               ` hch
2020-07-31 13:08                                                 ` hch
2020-07-31 15:07                                                   ` Kanchan Joshi
2022-03-02 20:47                                                   ` Luis Chamberlain
2020-08-05  7:35                                                 ` Damien Le Moal
2020-08-14  8:14                                                   ` hch
2020-08-14  8:27                                                     ` Damien Le Moal
2020-08-14 12:04                                                       ` hch
2020-08-14 12:20                                                         ` Damien Le Moal
2020-09-07  7:01                                                     ` Kanchan Joshi
2020-09-08 15:18                                                       ` hch
2020-09-24 17:19                                                         ` Kanchan Joshi
2020-09-25  2:52                                                           ` Damien Le Moal
2020-09-28 18:58                                                             ` Kanchan Joshi
2020-09-29  1:24                                                               ` Damien Le Moal [this message]
2020-09-29 18:49                                                                 ` Kanchan Joshi
2022-03-02 20:43                                                         ` Luis Chamberlain
2020-07-31  9:38                                       ` Kanchan Joshi
2022-03-02 20:51                                 ` Luis Chamberlain
2020-07-31  7:08                               ` Kanchan Joshi
2020-07-30 15:57       ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CY4PR04MB3751BFF86D1F7F1D22A143E6E7320@CY4PR04MB3751.namprd04.prod.outlook.com \
    --to=damien.lemoal@wdc.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=Naohiro.Aota@wdc.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bcrl@kvack.org \
    --cc=hch@infradead.org \
    --cc=io-uring@vger.kernel.org \
    --cc=javier.gonz@samsung.com \
    --cc=joshi.k@samsung.com \
    --cc=joshiiitr@gmail.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nj.shetty@samsung.com \
    --cc=selvakuma.s1@samsung.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.