All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Matthew Wilcox <willy@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>, Dave Chinner <david@fromorbit.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-kernel@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.com>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-nvdimm@lists.01.org, xfs@oss.sgi.com
Subject: Re: [RFC PATCH] dax, ext2, ext4, XFS: fix data corruption race
Date: Tue, 26 Jan 2016 14:05:21 +0100	[thread overview]
Message-ID: <20160126130521.GB23820@quack.suse.cz> (raw)
In-Reply-To: <20160126124812.GJ2948@linux.intel.com>

On Tue 26-01-16 07:48:12, Matthew Wilcox wrote:
> On Mon, Jan 25, 2016 at 02:59:21PM +0100, Jan Kara wrote:
> > On Mon 25-01-16 09:01:07, Dave Chinner wrote:
> > > What happens if we get rid of that DAX write fault optimisation that
> > > skips the initial read fault? The write fault will always run on a
> > > mapping that has a hole loaded, right?, so the race between
> > > dax_load_hole() and dax_insert_mapping() goes away, because nothing
> > > will be calling dax_load_hole() once the write fault is allocating
> > > blocks....
> > 
> > So frankly I don't like mixing of page locks into the DAX fault locking.
> > Also your scheme would require more tricks to deal with races between PMD
> > write faults racing with PTE read faults since you don't want to require
> > 2MB worth of hole-pages to be able to do a PMD write fault. Transparent
> > huge pages deal with this issue using compound pages but I'd like to avoid
> > that horror in the DAX path...
> 
> I *think* that what Dave's proposing (and if he isn't, I'm proposing it
> for him) is that the filesystem takes its allocation lock shared during
> the ->fault handler, then in the ->page_mkwrite handler, it knows that an
> allocation is coming, so it takes its allocation lock in exclusive mode.
> 
> So read vs write faults won't be able to race because the allocation lock
> will prevent it.

So this is correct and clean design but we will take the lock in exclusive
mode (and thus hurt scalability) for every write fault, not just for the
ones allocating blocks. And at the moment we take exclusive lock for write
faults, there's no more need for having the hole page instantiated - we can
still do it for simplicity but it's no longer necessary to avoid data
corruption.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Matthew Wilcox <willy@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>, Dave Chinner <david@fromorbit.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-kernel@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.com>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-nvdimm@ml01.01.org, xfs@oss.sgi.com
Subject: Re: [RFC PATCH] dax, ext2, ext4, XFS: fix data corruption race
Date: Tue, 26 Jan 2016 14:05:21 +0100	[thread overview]
Message-ID: <20160126130521.GB23820@quack.suse.cz> (raw)
In-Reply-To: <20160126124812.GJ2948@linux.intel.com>

On Tue 26-01-16 07:48:12, Matthew Wilcox wrote:
> On Mon, Jan 25, 2016 at 02:59:21PM +0100, Jan Kara wrote:
> > On Mon 25-01-16 09:01:07, Dave Chinner wrote:
> > > What happens if we get rid of that DAX write fault optimisation that
> > > skips the initial read fault? The write fault will always run on a
> > > mapping that has a hole loaded, right?, so the race between
> > > dax_load_hole() and dax_insert_mapping() goes away, because nothing
> > > will be calling dax_load_hole() once the write fault is allocating
> > > blocks....
> > 
> > So frankly I don't like mixing of page locks into the DAX fault locking.
> > Also your scheme would require more tricks to deal with races between PMD
> > write faults racing with PTE read faults since you don't want to require
> > 2MB worth of hole-pages to be able to do a PMD write fault. Transparent
> > huge pages deal with this issue using compound pages but I'd like to avoid
> > that horror in the DAX path...
> 
> I *think* that what Dave's proposing (and if he isn't, I'm proposing it
> for him) is that the filesystem takes its allocation lock shared during
> the ->fault handler, then in the ->page_mkwrite handler, it knows that an
> allocation is coming, so it takes its allocation lock in exclusive mode.
> 
> So read vs write faults won't be able to race because the allocation lock
> will prevent it.

So this is correct and clean design but we will take the lock in exclusive
mode (and thus hurt scalability) for every write fault, not just for the
ones allocating blocks. And at the moment we take exclusive lock for write
faults, there's no more need for having the hole page instantiated - we can
still do it for simplicity but it's no longer necessary to avoid data
corruption.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Matthew Wilcox <willy@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
	linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org,
	xfs@oss.sgi.com, Andreas Dilger <adilger.kernel@dilger.ca>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Jan Kara <jack@suse.com>,
	linux-fsdevel@vger.kernel.org, Jan Kara <jack@suse.cz>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-ext4@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>
Subject: Re: [RFC PATCH] dax, ext2, ext4, XFS: fix data corruption race
Date: Tue, 26 Jan 2016 14:05:21 +0100	[thread overview]
Message-ID: <20160126130521.GB23820@quack.suse.cz> (raw)
In-Reply-To: <20160126124812.GJ2948@linux.intel.com>

On Tue 26-01-16 07:48:12, Matthew Wilcox wrote:
> On Mon, Jan 25, 2016 at 02:59:21PM +0100, Jan Kara wrote:
> > On Mon 25-01-16 09:01:07, Dave Chinner wrote:
> > > What happens if we get rid of that DAX write fault optimisation that
> > > skips the initial read fault? The write fault will always run on a
> > > mapping that has a hole loaded, right?, so the race between
> > > dax_load_hole() and dax_insert_mapping() goes away, because nothing
> > > will be calling dax_load_hole() once the write fault is allocating
> > > blocks....
> > 
> > So frankly I don't like mixing of page locks into the DAX fault locking.
> > Also your scheme would require more tricks to deal with races between PMD
> > write faults racing with PTE read faults since you don't want to require
> > 2MB worth of hole-pages to be able to do a PMD write fault. Transparent
> > huge pages deal with this issue using compound pages but I'd like to avoid
> > that horror in the DAX path...
> 
> I *think* that what Dave's proposing (and if he isn't, I'm proposing it
> for him) is that the filesystem takes its allocation lock shared during
> the ->fault handler, then in the ->page_mkwrite handler, it knows that an
> allocation is coming, so it takes its allocation lock in exclusive mode.
> 
> So read vs write faults won't be able to race because the allocation lock
> will prevent it.

So this is correct and clean design but we will take the lock in exclusive
mode (and thus hurt scalability) for every write fault, not just for the
ones allocating blocks. And at the moment we take exclusive lock for write
faults, there's no more need for having the hole page instantiated - we can
still do it for simplicity but it's no longer necessary to avoid data
corruption.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2016-01-26 13:05 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-22 23:06 [RFC PATCH] dax, ext2, ext4, XFS: fix data corruption race Ross Zwisler
2016-01-22 23:06 ` Ross Zwisler
2016-01-22 23:06 ` Ross Zwisler
2016-01-23  2:01 ` Matthew Wilcox
2016-01-23  2:01   ` Matthew Wilcox
2016-01-23  2:01   ` Matthew Wilcox
2016-01-24 22:01 ` Dave Chinner
2016-01-24 22:01   ` Dave Chinner
2016-01-24 22:01   ` Dave Chinner
2016-01-25 13:59   ` Jan Kara
2016-01-25 13:59     ` Jan Kara
2016-01-25 13:59     ` Jan Kara
2016-01-26 12:48     ` Matthew Wilcox
2016-01-26 12:48       ` Matthew Wilcox
2016-01-26 12:48       ` Matthew Wilcox
2016-01-26 13:05       ` Jan Kara [this message]
2016-01-26 13:05         ` Jan Kara
2016-01-26 13:05         ` Jan Kara
2016-01-26 14:47         ` Matthew Wilcox
2016-01-26 14:47           ` Matthew Wilcox
2016-01-26 14:47           ` Matthew Wilcox
2016-01-25 20:46   ` Matthew Wilcox
2016-01-25 20:46     ` Matthew Wilcox
2016-01-25 20:46     ` Matthew Wilcox
2016-01-26  8:46     ` Jan Kara
2016-01-26  8:46       ` Jan Kara
2016-01-26  8:46       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160126130521.GB23820@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.