linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Ritesh Harjani <riteshh@linux.ibm.com>
Cc: Matthew Bobrowski <mbobrowski@mbobrowski.org>,
	Christoph Hellwig <hch@infradead.org>,
	tytso@mit.edu, jack@suse.cz, adilger.kernel@dilger.ca,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	david@fromorbit.com, darrick.wong@oracle.com
Subject: Re: [PATCH v3 5/6] ext4: introduce direct IO write path using iomap infrastructure
Date: Tue, 24 Sep 2019 12:57:30 +0200	[thread overview]
Message-ID: <20190924105730.GA11819@quack2.suse.cz> (raw)
In-Reply-To: <20190917090016.266CB520A1@d06av21.portsmouth.uk.ibm.com>

On Tue 17-09-19 14:30:15, Ritesh Harjani wrote:
> On 9/17/19 4:07 AM, Matthew Bobrowski wrote:
> > On Mon, Sep 16, 2019 at 05:12:48AM -0700, Christoph Hellwig wrote:
> > > > +	if (offset + count > i_size_read(inode) ||
> > > > +	    offset + count > EXT4_I(inode)->i_disksize) {
> > > > +		ext4_update_i_disksize(inode, inode->i_size);
> > > > +		extend = true;
> > > 
> > > Doesn't the ext4_update_i_disksize need to be under an open journal
> > > handle?
> > 
> > After all, it is a metadata update, which should go through an open journal
> > handle.
> 
> Hmmm, it seems like a race here. But I am not sure if this is just due to
> not updating i_disksize under open journal handle.
> 
> 
> So if we have a delayed buffered write to a file,
> in that case we first only update inode->i_size and update
> i_disksize at writeback time
> (i.e. during block allocation).
> In that case when we call for ext4_dio_write_iter
> since offset + len > i_disksize, we call for ext4_update_i_disksize().
> 
> Now if writeback for some reason failed. And the system crashes, during the
> DIO writes, after the blocks are allocated. Then during reboot we may have
> an inconsistent inode, since we did not add the inode into the
> orphan list before we updated the inode->i_disksize. And journal replay
> may not succeed.

OK, let me clear out some confusion here.

> 1. Can above actually happen? I am still not able to figure out the
>    race/inconsistency completely.
> 2. Can you please help explain under what other cases
>    it was necessary to call ext4_update_i_disksize() in DIO write paths?
> 3. When will i_disksize be out-of-sync with i_size during DIO writes?

So as I commented in my other reply to this patch, the code is definitely
wrong as is. The update of i_disksize in the original DIO code is connected
with the addition to the orphan list - we want to make sure orphan cleanup
in case of crash will truncate inode to the current i_size so we set
i_disksize to i_size. That being said I cannot find a reason why the update
would be actually necessary because at that point i_size should be equal to
i_disksize anyway. So the best theory I have is that it was added "just to
be sure" (the code got inherited from original ext3 codebase from before
git times).

To answer your other questions: If we decided to leave i_disksize update
in, the proper use would look like:

	handle = ext4_journal_start(...);
	if (ext4_update_i_disksize(inode))
		ext4_mark_inode_dirty(handle, inode);
	ext4_orphan_add(handle, inode);
	ext4_journal_stop();

Now the race you ask about in 1) can happen but it would be harmless.
As you mention buffered writeback does update i_disksize after submitting
IO to freshly allocated blocks (in the same transaction as block allocation
is) but stale data exposure is impossible because transaction commit will
wait for page writeback to complete.

To answer 3), i_disksize is not in sync with i_size either during truncate
(or similar operations generally protected by i_rwsem for writing) or
when there are pending delay allocated writes.

Now to answer 2) I don't think update of i_disksize is ever needed for DIO
path - once iomap_dio_rw() flushes pending dirty pages, i_disksize should
be equal to i_size since we hold i_rwsem at least for reading.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  parent reply	other threads:[~2019-09-24 12:57 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-12 11:03 [PATCH v3 0/6] ext4: port direct IO to iomap infrastructure Matthew Bobrowski
2019-09-12 11:03 ` [PATCH v3 1/6] ext4: introduce direct IO read path using " Matthew Bobrowski
2019-09-16 12:00   ` Christoph Hellwig
2019-09-16 13:07     ` Matthew Bobrowski
2019-09-12 11:04 ` [PATCH v3 2/6] ext4: move inode extension/truncate code out from ext4_iomap_end() Matthew Bobrowski
2019-09-23 16:21   ` Jan Kara
2019-09-24  9:50     ` Matthew Bobrowski
2019-09-24 13:13     ` Jan Kara
2019-09-12 11:04 ` [PATCH v3 3/6] iomap: split size and error for iomap_dio_rw ->end_io Matthew Bobrowski
2019-09-12 11:04 ` [PATCH v3 4/6] ext4: reorder map.m_flags checks in ext4_iomap_begin() Matthew Bobrowski
2019-09-16 12:05   ` Christoph Hellwig
2019-09-17 12:48     ` Matthew Bobrowski
2019-09-23 15:08   ` Jan Kara
2019-09-24  9:35     ` Matthew Bobrowski
2019-09-12 11:04 ` [PATCH v3 5/6] ext4: introduce direct IO write path using iomap infrastructure Matthew Bobrowski
2019-09-16  4:37   ` Ritesh Harjani
2019-09-16 10:14     ` Matthew Bobrowski
2019-09-16 12:12   ` Christoph Hellwig
2019-09-16 22:37     ` Matthew Bobrowski
2019-09-17  9:00       ` Ritesh Harjani
2019-09-17  9:02         ` Christoph Hellwig
2019-09-17 10:12           ` Ritesh Harjani
2019-09-17 12:39           ` Matthew Bobrowski
2019-09-24 10:57         ` Jan Kara [this message]
2019-09-17  9:06       ` Christoph Hellwig
2019-09-17 11:31         ` Matthew Bobrowski
2019-09-20 13:24         ` Matthew Bobrowski
2019-09-23 21:10   ` Jan Kara
2019-09-24 10:29     ` Matthew Bobrowski
2019-09-24 14:13       ` Jan Kara
2019-09-25  7:14         ` Matthew Bobrowski
2019-09-25  8:40           ` Jan Kara
2019-09-12 11:05 ` [PATCH v3 6/6] ext4: cleanup legacy buffer_head direct IO code Matthew Bobrowski
2019-09-16 12:06   ` Christoph Hellwig
2019-09-16 12:53     ` Matthew Bobrowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190924105730.GA11819@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mbobrowski@mbobrowski.org \
    --cc=riteshh@linux.ibm.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).