Linux-api Archive on lore.kernel.org
 help / color / Atom feed
From: "hch@infradead.org" <hch@infradead.org>
To: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: "hch@infradead.org" <hch@infradead.org>,
	Kanchan Joshi <joshiiitr@gmail.com>, Jens Axboe <axboe@kernel.dk>,
	Pavel Begunkov <asml.silence@gmail.com>,
	Kanchan Joshi <joshi.k@samsung.com>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"bcrl@kvack.org" <bcrl@kvack.org>,
	Matthew Wilcox <willy@infradead.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-aio@kvack.org" <linux-aio@kvack.org>,
	"io-uring@vger.kernel.org" <io-uring@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
	SelvaKumar S <selvakuma.s1@samsung.com>,
	Nitesh Shetty <nj.shetty@samsung.com>,
	Javier Gonzalez <javier.gonz@samsung.com>,
	Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Naohiro Aota <Naohiro.Aota@wdc.com>
Subject: Re: [PATCH v4 6/6] io_uring: add support for zone-append
Date: Fri, 14 Aug 2020 09:14:11 +0100
Message-ID: <20200814081411.GA16943@infradead.org> (raw)
In-Reply-To: <CY4PR04MB37517D633920E4D31AC6EA0DE74B0@CY4PR04MB3751.namprd04.prod.outlook.com>

On Wed, Aug 05, 2020 at 07:35:28AM +0000, Damien Le Moal wrote:
> > the write pointer.  The only interesting addition is that we also want
> > to report where we wrote.  So I'd rather have RWF_REPORT_OFFSET or so.
> 
> That works for me. But that rules out having the same interface for raw block
> devices since O_APPEND has no meaning in that case. So for raw block devices, it
> will have to be through zonefs. That works for me, and I think it was your idea
> all along. Can you confirm please ?

Yes.  I don't think think raw syscall level access to the zone append
primitive makes sense.  Either use zonefs for a file-like API, or
use the NVMe pass through interface for 100% raw access.

> >  - take the exclusive per-inode (zone) lock and just issue either normal
> >    writes or zone append at your choice, relying on the lock to
> >    serialize other writers.  For the async case this means we need a
> >    lock than can be release in a different context than it was acquired,
> >    which is a little ugly but can be done.
> 
> Yes, that would be possible. But likely, this will also need calls to
> inode_dio_wait() to avoid ending up with a mix of regular write and zone append
> writes in flight (which likely would result in the regular write failing as the
> zone append writes would go straight to the device without waiting for the zone
> write lock like regular writes do).

inode_dio_wait is a really bad implementation of almost a lock.  I've
started some work that I need to finish to just replace it with proper
non-owner rwsems (or even the range locks Dave has been looking into).

> 
> This all sound sensible to me. One last point though, specific to zonefs: if the
> user opens a zone file with O_APPEND, I do want to have that necessarily mean
> "use zone append". And same for the "RWF_REPORT_OFFSET". The point here is that
> both O_APPEND and RWF_REPORT_OFFSET can be used with both regular writes and
> zone append writes, but none of them actually clearly specify if the
> application/user tolerates writing data to disk in a different order than the
> issuing order... So another flag to indicate "atomic out-of-order writes" (==
> zone append) ?

O_APPEND pretty much implies out of order, as there is no way for an
application to know which thread wins the race to write the next chunk.

  reply index

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20200724155244epcas5p2902f57e36e490ee8772da19aa9408cdc@epcas5p2.samsung.com>
2020-07-24 15:49 ` [PATCH v4 0/6] zone-append support in io-uring and aio Kanchan Joshi
     [not found]   ` <CGME20200724155258epcas5p1a75b926950a18cd1e6c8e7a047e6c589@epcas5p1.samsung.com>
2020-07-24 15:49     ` [PATCH v4 1/6] fs: introduce FMODE_ZONE_APPEND and IOCB_ZONE_APPEND Kanchan Joshi
2020-07-24 16:34       ` Jens Axboe
2020-07-26 15:18       ` Christoph Hellwig
2020-07-28  1:49         ` Matthew Wilcox
2020-07-28  7:26           ` Christoph Hellwig
     [not found]   ` <CGME20200724155324epcas5p18e1d3b4402d1e4a8eca87d0b56a3fa9b@epcas5p1.samsung.com>
2020-07-24 15:49     ` [PATCH v4 2/6] fs: change ki_complete interface to support 64bit ret2 Kanchan Joshi
2020-07-26 15:18       ` Christoph Hellwig
     [not found]   ` <CGME20200724155329epcas5p345ba6bad0b8fe18056bb4bcd26c10019@epcas5p3.samsung.com>
2020-07-24 15:49     ` [PATCH v4 3/6] uio: return status with iov truncation Kanchan Joshi
     [not found]   ` <CGME20200724155341epcas5p15bfc55927f2abb60f19784270fe8e377@epcas5p1.samsung.com>
2020-07-24 15:49     ` [PATCH v4 4/6] block: add zone append handling for direct I/O path Kanchan Joshi
2020-07-26 15:19       ` Christoph Hellwig
     [not found]   ` <CGME20200724155346epcas5p2cfb383fe9904a45280c6145f4c13e1b4@epcas5p2.samsung.com>
2020-07-24 15:49     ` [PATCH v4 5/6] block: enable zone-append for iov_iter of bvec type Kanchan Joshi
2020-07-26 15:20       ` Christoph Hellwig
     [not found]   ` <CGME20200724155350epcas5p3b8f1d59eda7f8fbb38c828f692d42fd6@epcas5p3.samsung.com>
2020-07-24 15:49     ` [PATCH v4 6/6] io_uring: add support for zone-append Kanchan Joshi
2020-07-24 16:29       ` Jens Axboe
2020-07-27 19:16         ` Kanchan Joshi
2020-07-27 20:34           ` Jens Axboe
2020-07-30 16:08             ` Pavel Begunkov
2020-07-30 16:13               ` Jens Axboe
2020-07-30 16:26                 ` Pavel Begunkov
2020-07-30 17:16                   ` Jens Axboe
2020-07-30 17:38                     ` Pavel Begunkov
2020-07-30 17:51                       ` Kanchan Joshi
2020-07-30 17:54                         ` Jens Axboe
2020-07-30 18:25                           ` Kanchan Joshi
2020-07-31  6:42                             ` Damien Le Moal
2020-07-31  6:45                               ` hch
2020-07-31  6:59                                 ` Damien Le Moal
2020-07-31  7:58                                   ` Kanchan Joshi
2020-07-31  8:14                                     ` Damien Le Moal
2020-07-31  9:14                                       ` hch
2020-07-31  9:34                                         ` Damien Le Moal
2020-07-31  9:41                                           ` hch
2020-07-31 10:16                                             ` Damien Le Moal
2020-07-31 12:51                                               ` hch
2020-07-31 13:08                                                 ` hch
2020-07-31 15:07                                                   ` Kanchan Joshi
2020-08-05  7:35                                                 ` Damien Le Moal
2020-08-14  8:14                                                   ` hch [this message]
2020-08-14  8:27                                                     ` Damien Le Moal
2020-08-14 12:04                                                       ` hch
2020-08-14 12:20                                                         ` Damien Le Moal
2020-09-07  7:01                                                     ` Kanchan Joshi
2020-09-08 15:18                                                       ` hch
2020-09-24 17:19                                                         ` Kanchan Joshi
2020-09-25  2:52                                                           ` Damien Le Moal
2020-09-28 18:58                                                             ` Kanchan Joshi
2020-09-29  1:24                                                               ` Damien Le Moal
2020-09-29 18:49                                                                 ` Kanchan Joshi
2020-07-31  9:38                                       ` Kanchan Joshi
2020-07-31  7:08                               ` Kanchan Joshi
2020-07-30 15:57       ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200814081411.GA16943@infradead.org \
    --to=hch@infradead.org \
    --cc=Damien.LeMoal@wdc.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=Naohiro.Aota@wdc.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bcrl@kvack.org \
    --cc=io-uring@vger.kernel.org \
    --cc=javier.gonz@samsung.com \
    --cc=joshi.k@samsung.com \
    --cc=joshiiitr@gmail.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nj.shetty@samsung.com \
    --cc=selvakuma.s1@samsung.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-api Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-api/0 linux-api/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-api linux-api/ https://lore.kernel.org/linux-api \
		linux-api@vger.kernel.org
	public-inbox-index linux-api

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-api


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git