Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
From: Matthew Bobrowski <mbobrowski@mbobrowski.org>
To: RITESH HARJANI <riteshh@linux.ibm.com>
Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	jack@suse.cz, tytso@mit.edu
Subject: Re: [PATCH 4/5] ext4: introduce direct IO write code path using iomap infrastructure
Date: Tue, 13 Aug 2019 22:58:42 +1000
Message-ID: <20190813125840.GA10187@neo.Home> (raw)
In-Reply-To: <20190812170430.982E552051@d06av21.portsmouth.uk.ibm.com>

On Mon, Aug 12, 2019 at 10:34:29PM +0530, RITESH HARJANI wrote:
> > +	if (offset + count > i_size_read(inode) ||
> > +	    offset + count > EXT4_I(inode)->i_disksize) {
> > +		ext4_update_i_disksize(inode, inode->i_size);
> > +		extend = true;
> > +	}
> > +
> > +	ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops, ext4_dio_write_end_io);
> > +
> > +	/*
> > +	 * Unaligned direct AIO must be the only IO in flight or else
> > +	 * any overlapping aligned IO after unaligned IO might result
> > +	 * in data corruption.
> > +	 */
> > +	if (ret == -EIOCBQUEUED && (unaligned_aio || extend))
> > +		inode_dio_wait(inode);
> 
> Could you please add explain & add a comment about why we wait in AIO DIO
> case
> when extend is true? As I see without iomap code this case was not present
> earlier.

Because while using the iomap infrastructure for AIO writes, we return to the
caller prior to invoking the ->end_io() handler. This callback is responsible
for performing the in-core/on-disk inode extension if it is deemed
necessary. If we don't wait in the case of an extend, we run the risk of
loosing inode size consistencies in addition to things leading to possible
data corruption...

> > +
> > +	if (ret >= 0 && iov_iter_count(from)) {
> > +		overwrite ? inode_unlock_shared(inode) : inode_unlock(inode);
> > +		return ext4_buffered_write_iter(iocb, from);
> > +	}
> should not we copy code from "__generic_file_write_iter" which does below?
> 
> 3436                 /*
> 3437                  * We need to ensure that the page cache pages are
> written to
> 3438                  * disk and invalidated to preserve the expected
> O_DIRECT
> 3439                  * semantics.
> 3440                  */

Hm, I don't see why this would be required seeing as though the page cache
invalidation semantics pre and post write are handled by iomap_dio_rw() and
iomap_dio_complete(). But, I could be completely wrong here, so we may need to
wait for some others to provide comments on this.

> > +			WARN_ON(!(flags & IOMAP_DIRECT));
> > +			if (round_down(offset, i_blocksize(inode)) >=
> > +			    i_size_read(inode)) {
> > +				ret = ext4_map_blocks(handle, inode, &map,
> > +						      EXT4_GET_BLOCKS_CREATE);
> > +			} else if (!ext4_test_inode_flag(inode,
> > +							 EXT4_INODE_EXTENTS)) {
> > +				/*
> > +				 * We cannot fill holes in indirect
> > +				 * tree based inodes as that could
> > +				 * expose stale data in the case of a
> > +				 * crash. Use magic error code to
> > +				 * fallback to buffered IO.
> > +				 */
> > +				ret = ext4_map_blocks(handle, inode, &map, 0);
> > +				if (ret == 0)
> > +					ret = -ENOTBLK;
> > +			} else {
> > +				ret = ext4_map_blocks(handle, inode, &map,
> > +						      EXT4_GET_BLOCKS_IO_CREATE_EXT);
> > +			}
> > +		}
> 
> Could you please check & confirm on below points -
> 1. Do you see a problem @above in case of *overwrite* with extents mapping?
> It will fall into EXT4_GET_BLOCKS_IO_CREATE_EXT case.
> So are we piggy backing on the fact that ext4_map_blocks first call
> ext4_ext_map_blocks
> with flags & EXT4_GET_BLOCKS_KEEP_SIZE. And so for overwrite case since it
> will return
> val > 0 then we will anyway not create any blocks and so we don't need to
> check overwrite
> case specifically here?
> 
> 
> 2. For cases with flags passed is 0 to ext4_map_blocks (overwrite &
> fallocate without extent case),
> we need not start the journaling transaction. But in above we are doing
> ext4_journal_start/stop unconditionally
> & unnecessarily reserving dio_credits blocks.
> We need to take care of that right?

Hm, I think you raise valid points here.

Jan, do you have any comments on the above? I vaguely remember having a
discussion around dropping the overwrite checks in ext4_iomap_begin() as we're
removing the inode_lock() early on in ext4_dio_write_iter(), so it woudln't be
necessary to do so. But, now that Ritesh mentioned it again I'm thinking it
may actually be required...

> >   		if (ret < 0) {
> >   			ext4_journal_stop(handle);
> >   			if (ret == -ENOSPC &&
> > @@ -3581,10 +3611,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> >   		iomap->type = delalloc ? IOMAP_DELALLOC : IOMAP_HOLE;
> >   		iomap->addr = IOMAP_NULL_ADDR;
> >   	} else {
> > -		if (map.m_flags & EXT4_MAP_MAPPED) {
> > -			iomap->type = IOMAP_MAPPED;
> > -		} else if (map.m_flags & EXT4_MAP_UNWRITTEN) {
> > +		if (map.m_flags & EXT4_MAP_UNWRITTEN) {
> >   			iomap->type = IOMAP_UNWRITTEN;
> > +		} else if (map.m_flags & EXT4_MAP_MAPPED) {
> > +			iomap->type = IOMAP_MAPPED;
> Maybe a comment as to explaining why checking UNWRITTEN before is necessary
> for others.
> So in case of fallocate & DIO write case we may get extent which is both
> unwritten & mapped (right?).
> so we need to check if we have an unwritten extent first so that it will
> need the conversion in ->end_io
> callback.

Yes, that is essentially correct.

--M

  reply index

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-12 12:52 [PATCH 0/5] ext4: direct IO via " Matthew Bobrowski
2019-08-12 12:52 ` [PATCH 1/5] ext4: introduce direct IO read code path using " Matthew Bobrowski
2019-08-12 17:18   ` Christoph Hellwig
2019-08-12 20:17     ` Matthew Wilcox
2019-08-13 10:45       ` Matthew Bobrowski
2019-08-12 12:52 ` [PATCH 2/5] ext4: move inode extension/truncate code out from ext4_iomap_end() Matthew Bobrowski
2019-08-12 17:18   ` Christoph Hellwig
2019-08-13 10:46     ` Matthew Bobrowski
2019-08-28 19:59   ` Jan Kara
2019-08-28 21:54     ` Matthew Bobrowski
2019-08-29  8:18       ` Jan Kara
2019-08-12 12:53 ` [PATCH 3/5] iomap: modify ->end_io() calling convention Matthew Bobrowski
2019-08-12 17:18   ` Christoph Hellwig
2019-08-13 10:43     ` Matthew Bobrowski
2019-08-12 12:53 ` [PATCH 4/5] ext4: introduce direct IO write code path using iomap infrastructure Matthew Bobrowski
2019-08-12 17:04   ` RITESH HARJANI
2019-08-13 12:58     ` Matthew Bobrowski [this message]
2019-08-13 14:35       ` Darrick J. Wong
2019-08-14  9:51         ` Matthew Bobrowski
2019-08-12 17:34   ` Christoph Hellwig
2019-08-13 10:45     ` Matthew Bobrowski
2019-08-28 20:26   ` Jan Kara
2019-08-28 22:32     ` Dave Chinner
2019-08-29  8:03       ` Jan Kara
2019-08-29 11:47       ` Matthew Bobrowski
2019-08-29 11:45     ` Matthew Bobrowski
2019-08-29 12:38       ` Jan Kara
2019-08-12 12:53 ` [PATCH 5/5] ext4: clean up redundant buffer_head direct IO code Matthew Bobrowski
2019-08-12 17:31 ` [PATCH 0/5] ext4: direct IO via iomap infrastructure RITESH HARJANI
2019-08-13 11:10   ` Matthew Bobrowski
2019-08-13 12:27     ` RITESH HARJANI
2019-08-14  9:48       ` Matthew Bobrowski
2019-08-14 11:58         ` RITESH HARJANI
2019-08-21 13:14       ` Matthew Bobrowski
2019-08-22 12:00         ` Matthew Bobrowski
2019-08-22 14:11           ` Ritesh Harjani
2019-08-24  3:18             ` Matthew Bobrowski
2019-08-24  3:55               ` Darrick J. Wong
2019-08-24 23:04                 ` Christoph Hellwig
2019-08-27  9:52                   ` Matthew Bobrowski
2019-08-28 12:05                     ` Matthew Bobrowski
2019-08-28 14:27                       ` Theodore Y. Ts'o
2019-08-28 18:02                         ` Jan Kara
2019-08-29  6:36                           ` Christoph Hellwig
2019-08-29 11:20                             ` Matthew Bobrowski
2019-08-29 14:41                               ` Christoph Hellwig
2019-08-23 13:43           ` [RFC 1/1] ext4: PoC implementation of option-1 Ritesh Harjani
2019-08-23 13:49             ` Ritesh Harjani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190813125840.GA10187@neo.Home \
    --to=mbobrowski@mbobrowski.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=riteshh@linux.ibm.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git