linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Jan Kara <jack@suse.cz>, NeilBrown <neilb@suse.de>,
	axboe@kernel.dk, lucho@ionkov.net, ericvh@gmail.com,
	tytso@mit.edu, rminnich@sandia.gov, viro@zeniv.linux.org.uk,
	martin.petersen@oracle.com, david@fromorbit.com,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	adilger.kernel@dilger.ca, bharrosh@panasas.com,
	jlayton@samba.org, v9fs-developer@lists.sourceforge.net,
	linux-ext4@vger.kernel.org
Subject: Re: [PATCH 4/4] ext3: Warn if mounting rw on a disk requiring stable page writes
Date: Wed, 5 Dec 2012 13:12:28 +0100	[thread overview]
Message-ID: <20121205121228.GB5706@quack.suse.cz> (raw)
In-Reply-To: <20121127021740.GA11869@blackbox.djwong.org>

On Mon 26-11-12 18:17:40, Darrick J. Wong wrote:
> On Thu, Nov 22, 2012 at 10:12:40AM +0100, Jan Kara wrote:
> > On Wed 21-11-12 17:47:55, Darrick J. Wong wrote:
> > > On Thu, Nov 22, 2012 at 08:47:13AM +1100, NeilBrown wrote:
> > > > On Wed, 21 Nov 2012 22:33:33 +0100 Jan Kara <jack@suse.cz> wrote:
> > > > 
> > > > > On Wed 21-11-12 13:13:19, Darrick J. Wong wrote:
> > > > > > On Wed, Nov 21, 2012 at 03:15:43AM +0100, Jan Kara wrote:
> > > > > > > On Tue 20-11-12 18:00:56, Darrick J. Wong wrote:
> > > > > > > > ext3 doesn't properly isolate pages from changes during writeback.  Since the
> > > > > > > > recommended fix is to use ext4, for now we'll just print a warning if the user
> > > > > > > > tries to mount in write mode.
> > > > > > > > 
> > > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > > > ---
> > > > > > > >  fs/ext3/super.c |    8 ++++++++
> > > > > > > >  1 file changed, 8 insertions(+)
> > > > > > > > 
> > > > > > > > 
> > > > > > > > diff --git a/fs/ext3/super.c b/fs/ext3/super.c
> > > > > > > > index 5366393..5b3725d 100644
> > > > > > > > --- a/fs/ext3/super.c
> > > > > > > > +++ b/fs/ext3/super.c
> > > > > > > > @@ -1325,6 +1325,14 @@ static int ext3_setup_super(struct super_block *sb, struct ext3_super_block *es,
> > > > > > > >  			"forcing read-only mode");
> > > > > > > >  		res = MS_RDONLY;
> > > > > > > >  	}
> > > > > > > > +	if (!read_only &&
> > > > > > > > +	    queue_requires_stable_pages(bdev_get_queue(sb->s_bdev))) {
> > > > > > > > +		ext3_msg(sb, KERN_ERR,
> > > > > > > > +			"error: ext3 cannot safely write data to a disk "
> > > > > > > > +			"requiring stable pages writes; forcing read-only "
> > > > > > > > +			"mode.  Upgrading to ext4 is recommended.");
> > > > > > > > +		res = MS_RDONLY;
> > > > > > > > +	}
> > > > > > > >  	if (read_only)
> > > > > > > >  		return res;
> > > > > > > >  	if (!(sbi->s_mount_state & EXT3_VALID_FS))
> > > > > > >   Why this? ext3 should be fixed by your change to
> > > > > > > filemap_page_mkwrite()... Or does testing show otherwise?
> > > > > > 
> > > > > > Yes, it's still broken even with this new set of changes.  Now that I think
> > > > > > about it a little more, I recall that writeback mode was actually fine, so this
> > > > > > is a little harsh.
> > > > > > 
> > > > > > Hm... looking at the ordered code a little more, it looks like
> > > > > > ext3_ordered_write_end is calling journal_dirty_data_fn, which (I guess?) tries
> > > > > > to write mapped buffers back through the journal?  Taking it out seems to fix
> > > > > > ordered mode, though I have a suspicion that it might very well break ordered
> > > > > > mode too.
> > > > >   Oh, right. kjournald writing buffers directly (without setting
> > > > > PageWriteback) will break things. So please, change warning to:
> > > 
> > > Maybe we should just fix this anyway?
> > > 
> > > I still have the patch that adds PG_stable (and changes the
> > > wait_for_page_stable() test to use this flag instead of PG_writeback) kicking
> > > around in my tree.  I wrote a patch to jbd that changes journal_do_submit_data
> > > to set PG_stable, call clear_page_dirty_for_io(), and unsets the stable bit in
> > > the end_io processing.
> > > 
> > > It seems to get rid of the checksum-after-write errors, though I'm not
> > > convinced it's correct.  But, I'll send both patches along.
> >   I'll check the patches. Fixing PageWriteback logic for ext3 is not easily
> > doable due to lock ranking constraints - PageWriteback has to be set under
> > PageLocked but that ranks above transaction start so kjournald cannot grab
> > page locks so it cannot set PageWriteback... And changing the lock ordering
> > is a major surgery.
> > 
> > What could be doable is waiting for buffer locks from ext3's ->write_begin
> > and ->page_mkwrite implementations in case stable writes are required. If
> > your approach with a separate page bit doesn't work out (and I have some
> > doubts about that as mm people are *really* thrifty with page bits).
> > 
> > > > > 	/*
> > > > > 	 * In data=ordered mode, kjournald writes buffers without setting
> > > > > 	 * PageWriteback bit thus generic code does not properly wait for
> > > > > 	 * writeback of those buffers to finish.
> > > > > 	 */
> > > > > 	if (!read_only &&
> > > > > 	    test_opt(sb, DATA_FLAGS) == EXT3_MOUNT_ORDERED_DATA &&
> > > 
> > > test_opt(sb, DATA_FLAGS) != EXT3_MOUNT_WRITEBACK_DATA
> > > 
> > > since I bet data=journal mode is also borken wrt PageWriteback.
> >   It is broken wrt PageWriteback but it actually waits for buffer locks in
> > ->write_begin() so at least write path should be properly protected. But
> > mmap is not handled properly there (although that wouldn't be that hard to
> > fix). So I agree the condition should rather be what you suggest.
  Sorry for late reply. I was on vacation...

> Hm.  In journal mode, write_begin calls do_journal_get_write_access on each
> buffer for a given page, and in turn, jbd's do_get_write_access calls
> lock_buffer.  Is that what you're referring to by "actually waits for buffer
> locks"?  I'm wondering how that helps us, since afaict PG_writeback doesn't get
> set in that path, and I think it's a little early to be setting PG_writeback
> anyway.
  It does help us. In ext3 data writeback is done either by flusher thread,
that happens under PG_Writeback and generic code waits for that as need, or
by kjournald - that happens under buffer lock and as you properly observed
do_get_write_access() waits for that (and actually copies data that should
go to disk to a separate buffer if needed).

> If the page has to be locked before the transaction starts, how much of a
> problem is it to set PG_writeback?  Even though that seems a bit early to be
> doing that?
  Well, what you would need to make things consistent is to set
PG_writeback from kjournald so that all writeback happens with PG_writeback
set on the page. But setting has to happen while the page is locked and
kjournald can never block on page lock because that would cause
deadlocks...

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2012-12-05 12:12 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-21  2:00 [PATCH v2.1 0/3] mm/fs: Implement faster stable page writes on filesystems Darrick J. Wong
2012-11-21  2:00 ` [PATCH 1/4] bdi: Track users that require stable page writes Darrick J. Wong
2012-11-21  7:54   ` Christoph Hellwig
2012-11-21 10:52     ` Christoph Hellwig
2012-11-21 10:56   ` Christoph Hellwig
2012-11-21 21:52     ` Darrick J. Wong
2012-11-21 22:06       ` NeilBrown
2012-11-22  2:33         ` [PATCH] " Darrick J. Wong
2012-11-22  7:08           ` Christoph Hellwig
2012-11-21  2:00 ` [PATCH 2/4] mm: Only enforce stable page writes if the backing device requires it Darrick J. Wong
2012-11-21 10:57   ` Christoph Hellwig
2012-11-21  2:00 ` [PATCH 3/4] 9pfs: Fix filesystem to wait for stable page writeback Darrick J. Wong
2012-11-21  2:00 ` [PATCH 4/4] ext3: Warn if mounting rw on a disk requiring stable page writes Darrick J. Wong
2012-11-21  2:15   ` Jan Kara
2012-11-21 21:13     ` Darrick J. Wong
2012-11-21 21:33       ` Jan Kara
2012-11-21 21:47         ` NeilBrown
2012-11-22  1:47           ` Darrick J. Wong
2012-11-22  2:36             ` [RFC PATCH 1/2] mm: Introduce page flag to indicate stable page status Darrick J. Wong
2012-11-22  2:36             ` [RFC PATCH 2/2] jbd: Stabilize pages during writes when in ordered mode Darrick J. Wong
2012-11-22  9:19               ` Jan Kara
2012-11-22  9:12             ` [PATCH 4/4] ext3: Warn if mounting rw on a disk requiring stable page writes Jan Kara
2012-11-27  2:17               ` Darrick J. Wong
2012-12-05 12:12                 ` Jan Kara [this message]
2012-12-08  1:09                   ` Darrick J. Wong
2012-12-10 10:41                     ` Jan Kara
2012-11-22 23:15             ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121205121228.GB5706@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=axboe@kernel.dk \
    --cc=bharrosh@panasas.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=ericvh@gmail.com \
    --cc=jlayton@samba.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucho@ionkov.net \
    --cc=martin.petersen@oracle.com \
    --cc=neilb@suse.de \
    --cc=rminnich@sandia.gov \
    --cc=tytso@mit.edu \
    --cc=v9fs-developer@lists.sourceforge.net \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).