Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
From: Matthew Bobrowski <mbobrowski@mbobrowski.org>
To: Jan Kara <jack@suse.cz>
Cc: tytso@mit.edu, adilger.kernel@dilger.ca,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	hch@infradead.org, david@fromorbit.com, darrick.wong@oracle.com
Subject: Re: [PATCH v4 8/8] ext4: introduce direct I/O write path using iomap infrastructure
Date: Wed, 9 Oct 2019 22:53:28 +1100
Message-ID: <20191009115326.GG14749@poseidon.bobrowski.net> (raw)
In-Reply-To: <20191008151238.GK5078@quack2.suse.cz>

On Tue, Oct 08, 2019 at 05:12:38PM +0200, Jan Kara wrote:
> On Thu 03-10-19 21:35:05, Matthew Bobrowski wrote:
> > Any blocks
> > that have been allocated in preparation for direct I/O write will be
> > reused by buffered I/O, so there's no issue with leaving allocated
> > blocks beyond EOF.
> 
> This actually is not true as ext4_truncate_failed_write() will trim blocks
> beyond EOF. Also this would not be 100% reliable as if we crash between DIO
> short write succeeding and buffered write happening, we would leave inode
> with blocks beyond EOF. So I'd just remove this sentence.

OK.

> > Existing direct I/O write buffer_head code has been removed as it's
> > now redundant.
> > 
> > Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
> ...
> > +static int ext4_dio_write_end_io(struct kiocb *iocb, ssize_t size,
> > +                                int error, unsigned int flags)
> > +{
> > +       loff_t offset = iocb->ki_pos;
> > +       struct inode *inode = file_inode(iocb->ki_filp);
> > +
> > +       if (!error && (flags & IOMAP_DIO_UNWRITTEN))
> > +               error = ext4_convert_unwritten_extents(NULL, inode, offset,
> > +                                                      size);
> > +       return ext4_handle_inode_extension(inode, offset, error ? : size,
> > +                                          size);
> > +}
> 
> I was pondering about this and I don't think we have it quite correct.
> Still :-|. The problem is that iomap_dio_complete() will pass dio->size as
> 'size', which is the amount of submitted IO but not necessarily the amount
> of blocks that were mapped (that can be larger). Thus
> ext4_handle_inode_extension() can miss the fact that there are blocks
> beyond EOF that need to be trimmed. And we have no way of finding that out
> inside our ->end_io handler. Even iomap_dio_complete() doesn't have that
> information so we'd need to add 'original length' to struct iomap_dio and
> then pass it do ->end_io.

Yes, I remember having a discussion around this in the past. The
answer to this problem at the time was that any blocks that may have
been allocated in preparation for the direct I/O write and we're not
used would in fact be reused when we fell back to buffered I/O. The
case that you're describing above, based on my understanding, would
have to be a result of short write?

> Seeing how difficult it is when a filesystem wants to complete the iocb
> synchronously (regardless whether it is async or sync) and have all the
> information in one place for further processing, I think it would be the
> easiest to provide iomap_dio_rw_wait() that forces waiting for the iocb to
> complete *and* returns the appropriate return value instead of pretty
> useless EIOCBQUEUED. It is actually pretty trivial (patch attached). With
> this we can then just call iomap_dio_rw_sync() for the inode extension case
> with ->end_io doing just the unwritten extent processing and then call
> ext4_handle_inode_extension() from ext4_direct_write_iter() where we would
> have all the information we need.

This could also work, nicely. But, if this isn't an option for
whatever reason then we could go with what you suggested above? After
all, I think it would make sense to pass such information about the
write all the way through to the end, especially the ->end_io handler,
seeing as though that's we're clean up should be performed in the
instance of a failure, or a short write.

However, note that:

a) likely(my understanding may be wrong)

b) Someone a lot smarter than I has probably already thought this
   through and there's a real good reason why we don't cram such
   information about the write within the iomap structures and have
   them passed all the way through to the end...

:)

--<M>--

      parent reply index

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-03 11:32 [PATCH v4 0/8] ext4: port direct I/O to " Matthew Bobrowski
2019-10-03 11:33 ` [PATCH v4 1/8] ext4: move out iomap field population into separate helper Matthew Bobrowski
2019-10-08 10:27   ` Jan Kara
2019-10-09  8:57     ` Matthew Bobrowski
2019-10-09 13:06       ` Jan Kara
2019-10-10  5:39         ` Matthew Bobrowski
2019-10-09  6:02   ` Ritesh Harjani
2019-10-09  7:07     ` Christoph Hellwig
2019-10-09  7:50       ` Ritesh Harjani
2019-10-09  9:09         ` Matthew Bobrowski
2019-10-03 11:33 ` [PATCH v4 2/8] ext4: move out IOMAP_WRITE path " Matthew Bobrowski
2019-10-08 10:31   ` Jan Kara
2019-10-09  9:18     ` Matthew Bobrowski
2019-10-09  6:22   ` Ritesh Harjani
2019-10-09  9:31     ` Matthew Bobrowski
2019-10-03 11:33 ` [PATCH v4 3/8] ext4: introduce new callback for IOMAP_REPORT operations Matthew Bobrowski
2019-10-08 10:42   ` Jan Kara
2019-10-09  9:41     ` Matthew Bobrowski
2019-10-09  6:00   ` Ritesh Harjani
2019-10-09 12:08     ` Matthew Bobrowski
2019-10-09 13:14       ` Ritesh Harjani
2019-10-03 11:34 ` [PATCH v4 4/8] ext4: introduce direct I/O read path using iomap infrastructure Matthew Bobrowski
2019-10-08 10:52   ` Jan Kara
2019-10-09 10:55     ` Matthew Bobrowski
2019-10-09  6:39   ` Ritesh Harjani
2019-10-03 11:34 ` [PATCH v4 5/8] ext4: move inode extension/truncate code out from ->iomap_end() callback Matthew Bobrowski
2019-10-08 11:25   ` Jan Kara
2019-10-09 10:18     ` Matthew Bobrowski
2019-10-09 12:51       ` Jan Kara
2019-10-10  5:44         ` Matthew Bobrowski
2019-10-09  6:27   ` Ritesh Harjani
2019-10-09 10:20     ` Matthew Bobrowski
2019-10-03 11:34 ` [PATCH v4 6/8] ext4: move inode extension checks out from ext4_iomap_alloc() Matthew Bobrowski
2019-10-08 11:27   ` Jan Kara
2019-10-09 10:21     ` Matthew Bobrowski
2019-10-09  6:30   ` Ritesh Harjani
2019-10-09 10:39     ` Matthew Bobrowski
2019-10-03 11:34 ` [PATCH v4 7/8] ext4: reorder map.m_flags checks in ext4_set_iomap() Matthew Bobrowski
2019-10-08 11:30   ` Jan Kara
2019-10-09 10:42     ` Matthew Bobrowski
2019-10-09  6:35   ` Ritesh Harjani
2019-10-09 10:43     ` Matthew Bobrowski
2019-10-03 11:35 ` [PATCH v4 8/8] ext4: introduce direct I/O write path using iomap infrastructure Matthew Bobrowski
2019-10-08 15:12   ` Jan Kara
2019-10-09  7:11     ` Christoph Hellwig
2019-10-09 13:41       ` Jan Kara
2019-10-09 11:53     ` Matthew Bobrowski [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191009115326.GG14749@poseidon.bobrowski.net \
    --to=mbobrowski@mbobrowski.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git