Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
From: Jan Kara <jack@suse.cz>
To: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>,
	Matthew Bobrowski <mbobrowski@mbobrowski.org>,
	tytso@mit.edu, adilger.kernel@dilger.ca,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	david@fromorbit.com, darrick.wong@oracle.com
Subject: Re: [PATCH v4 8/8] ext4: introduce direct I/O write path using iomap infrastructure
Date: Wed, 9 Oct 2019 15:41:23 +0200
Message-ID: <20191009134123.GE5050@quack2.suse.cz> (raw)
In-Reply-To: <20191009071145.GB32281@infradead.org>

On Wed 09-10-19 00:11:45, Christoph Hellwig wrote:
> On Tue, Oct 08, 2019 at 05:12:38PM +0200, Jan Kara wrote:
> > Seeing how difficult it is when a filesystem wants to complete the iocb
> > synchronously (regardless whether it is async or sync) and have all the
> > information in one place for further processing, I think it would be the
> > easiest to provide iomap_dio_rw_wait() that forces waiting for the iocb to
> > complete *and* returns the appropriate return value instead of pretty
> > useless EIOCBQUEUED. It is actually pretty trivial (patch attached). With
> > this we can then just call iomap_dio_rw_sync() for the inode extension case
> > with ->end_io doing just the unwritten extent processing and then call
> > ext4_handle_inode_extension() from ext4_direct_write_iter() where we would
> > have all the information we need.
> > 
> > Christoph, Darrick, what do you think about extending iomap like in the
> > attached patch (plus sample use in XFS)?
> 
> I vaguely remember suggesting something like this but Brian or Dave
> convinced me it wasn't a good idea.  This will require a trip to the
> xfs or fsdevel archives from when the inode_dio_wait was added in XFS.

I think you refer to this [1] message from Brian:

It's not quite that simple..

FWIW, the discussion (between Dave and I) for how best to solve this
started offline prior to sending the patch and pretty much started with
the idea of changing the async I/O to sync as you suggest here. I backed
off from that because it's too subtle given the semantics between the
higher level aio code and lower level dio code for async I/O. By that I
mean either can be responsible for calling the ->ki_complete() callback
in the iocb on I/O completion.

IOW, if we receive an async direct I/O, clear ->ki_complete() as you
describe above and submit it, the dio code will wait on I/O and return
the size of the I/O on successful completion. It will not have called
->ki_complete(), however. Rather, the >0 return value indicates that
aio_rw_done() must call ->ki_complete() after xfs_file_write_iter()
returns, but we would have already cleared the function pointer.

I think it is technically possible to use this technique by clearing and
restoring ->ki_complete(), but in general we've visited this "change the
I/O type" approach twice now and we've (collectively) got it wrong both
times (the first error in thinking was that XFS would need to call
->ki_complete()). IMO, this demonstrates that it's not worth the
complexity to insert ourselves into this dependency chain when we can
accomplish the same thing with a simple dio wait call.

---

Which is fair enough. I've been looking at the same and arrived at similar
conclusion ;) (BTW, funnily enough ocfs2 seems to do this dance with
clearing and restoring ki_complete). But what I propose is something
different - just wait for IO in iomap_dio_rw() which avoids the need to
clear & restore ->ki_complete() while at the same time while providing
fully-sync IO experience to the caller. So Brians objection doesn't apply
here.

> But if we decide it actully works this time around please don't add the
> __ variant but just add the parameter to iomap_dio_rw directly.

Yeah, I was undecided on this one as well. Will change this and post the
patches to fsdevel & xfs lists.

								Honza

[1] https://lore.kernel.org/linux-xfs/20190325135124.GD52167@bfoster/
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply index

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-03 11:32 [PATCH v4 0/8] ext4: port direct I/O to " Matthew Bobrowski
2019-10-03 11:33 ` [PATCH v4 1/8] ext4: move out iomap field population into separate helper Matthew Bobrowski
2019-10-08 10:27   ` Jan Kara
2019-10-09  8:57     ` Matthew Bobrowski
2019-10-09 13:06       ` Jan Kara
2019-10-10  5:39         ` Matthew Bobrowski
2019-10-09  6:02   ` Ritesh Harjani
2019-10-09  7:07     ` Christoph Hellwig
2019-10-09  7:50       ` Ritesh Harjani
2019-10-09  9:09         ` Matthew Bobrowski
2019-10-03 11:33 ` [PATCH v4 2/8] ext4: move out IOMAP_WRITE path " Matthew Bobrowski
2019-10-08 10:31   ` Jan Kara
2019-10-09  9:18     ` Matthew Bobrowski
2019-10-09  6:22   ` Ritesh Harjani
2019-10-09  9:31     ` Matthew Bobrowski
2019-10-03 11:33 ` [PATCH v4 3/8] ext4: introduce new callback for IOMAP_REPORT operations Matthew Bobrowski
2019-10-08 10:42   ` Jan Kara
2019-10-09  9:41     ` Matthew Bobrowski
2019-10-09  6:00   ` Ritesh Harjani
2019-10-09 12:08     ` Matthew Bobrowski
2019-10-09 13:14       ` Ritesh Harjani
2019-10-03 11:34 ` [PATCH v4 4/8] ext4: introduce direct I/O read path using iomap infrastructure Matthew Bobrowski
2019-10-08 10:52   ` Jan Kara
2019-10-09 10:55     ` Matthew Bobrowski
2019-10-09  6:39   ` Ritesh Harjani
2019-10-03 11:34 ` [PATCH v4 5/8] ext4: move inode extension/truncate code out from ->iomap_end() callback Matthew Bobrowski
2019-10-08 11:25   ` Jan Kara
2019-10-09 10:18     ` Matthew Bobrowski
2019-10-09 12:51       ` Jan Kara
2019-10-10  5:44         ` Matthew Bobrowski
2019-10-09  6:27   ` Ritesh Harjani
2019-10-09 10:20     ` Matthew Bobrowski
2019-10-03 11:34 ` [PATCH v4 6/8] ext4: move inode extension checks out from ext4_iomap_alloc() Matthew Bobrowski
2019-10-08 11:27   ` Jan Kara
2019-10-09 10:21     ` Matthew Bobrowski
2019-10-09  6:30   ` Ritesh Harjani
2019-10-09 10:39     ` Matthew Bobrowski
2019-10-03 11:34 ` [PATCH v4 7/8] ext4: reorder map.m_flags checks in ext4_set_iomap() Matthew Bobrowski
2019-10-08 11:30   ` Jan Kara
2019-10-09 10:42     ` Matthew Bobrowski
2019-10-09  6:35   ` Ritesh Harjani
2019-10-09 10:43     ` Matthew Bobrowski
2019-10-03 11:35 ` [PATCH v4 8/8] ext4: introduce direct I/O write path using iomap infrastructure Matthew Bobrowski
2019-10-08 15:12   ` Jan Kara
2019-10-09  7:11     ` Christoph Hellwig
2019-10-09 13:41       ` Jan Kara [this message]
2019-10-09 11:53     ` Matthew Bobrowski

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191009134123.GE5050@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mbobrowski@mbobrowski.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git