linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>,
	ira.weiny@intel.com, linux-kernel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Dan Williams <dan.j.williams@intel.com>,
	"Theodore Y. Ts'o" <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH V4 07/13] fs: Add locking for a dynamic address space operations state
Date: Tue, 25 Feb 2020 18:36:33 +0100	[thread overview]
Message-ID: <20200225173633.GA30843@lst.de> (raw)
In-Reply-To: <20200225000937.GA10776@dread.disaster.area>

On Tue, Feb 25, 2020 at 11:09:37AM +1100, Dave Chinner wrote:
> > No, the original code was broken because it didn't serialize between
> > DAX and buffer access.
> > 
> > Take a step back and look where the problems are, and they are not
> > mostly with the aops.  In fact the only aop useful for DAX is
> > is ->writepages, and how it uses ->writepages is actually a bit of
> > an abuse of that interface.
> 
> The races are all through the fops, too, which is one of the reasons
> Darrick mentioned we should probably move this up to file ops
> level...

But the file ops are very simple to use.  Pass the flag in the iocb,
and make sure the flag can only changed with i_rwsem held.  That part
is pretty trivial, the interesting case is mmap because it is so
spread out.

> > So what we really need it just a way to prevent switching the flag
> > when a file is mapped,
> 
> That's not sufficient.
> 
> We also have to prevent the file from being mapped *while we are
> switching*. Nothing in the mmap() path actually serialises against
> filesystem operations, and the initial behavioural checks in the
> page fault path are similarly unserialised against changing the
> S_DAX flag.

And the important part here is ->mmap.  If ->mmap doesn't get through
we are not going to see page faults.

> > and in the normal read/write path ensure the
> > flag can't be switch while I/O is going on, which could either be
> > done by ensuring it is only switched under i_rwsem or equivalent
> > protection, or by setting the DAX flag once in the iocb similar to
> > IOCB_DIRECT.
> 
> The iocb path is not the problem - that's entirely serialised
> against S_DAX changes by the i_rwsem. The problem is that we have no
> equivalent filesystem level serialisation for the entire mmap/page
> fault path, and it checks S_DAX all over the place.

Not right now.  We have various IS_DAX checks outside it.  But it is
easily fixable indeed.

> /me wonders if the best thing to do is to add a ->fault callout to
> tell the filesystem to lock/unlock the inode right up at the top of
> the page fault path, outside even the mmap_sem.  That means all the
> methods that the page fault calls are protected against S_DAX
> changes, and it gives us a low cost method of serialising page
> faults against DIO (e.g. via inode_dio_wait())....

Maybe.  Especially if it solves real problems and isn't just new
overhead to add an esoteric feature.

> 
> > And they easiest way to get all this done is as a first step to
> > just allowing switching the flag when no blocks are allocated at
> > all, similar to how the rt flag works.
> 
> False equivalence - it is not similar because the RT flag changes
> and their associated state checks are *already fully serialised* by
> the XFS_ILOCK_EXCL. S_DAX accesses have no such serialisation, and
> that's the problem we need to solve...

And my point is that if we ensure S_DAX can only be checked if there
are no blocks on the file, is is fairly easy to provide the same
guarantee.  And I've not heard any argument that we really need more
flexibility than that.  In fact I think just being able to change it
on the parent directory and inheriting the flag might be more than
plenty, which would lead to a very simple implementation without any
of the crazy overhead in this series.

  reply	other threads:[~2020-02-25 17:36 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-21  0:41 [PATCH V4 00/13] Enable per-file/per-directory DAX operations V4 ira.weiny
2020-02-21  0:41 ` [PATCH V4 01/13] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
2020-02-21  1:26   ` Dave Chinner
2020-02-27 17:52     ` Ira Weiny
2020-02-21  0:41 ` [PATCH V4 02/13] fs/xfs: Clarify lockdep dependency for xfs_isilocked() ira.weiny
2020-02-21  1:34   ` Dave Chinner
2020-02-21 23:00     ` Ira Weiny
2020-02-21  0:41 ` [PATCH V4 03/13] fs: Remove unneeded IS_DAX() check ira.weiny
2020-02-21  1:42   ` Dave Chinner
2020-02-21 23:04     ` Ira Weiny
2020-02-21 17:42   ` Christoph Hellwig
2020-02-21  0:41 ` [PATCH V4 04/13] fs/stat: Define DAX statx attribute ira.weiny
2020-02-21  0:41 ` [PATCH V4 05/13] fs/xfs: Isolate the physical DAX flag from enabled ira.weiny
2020-02-21  0:41 ` [PATCH V4 06/13] fs/xfs: Create function xfs_inode_enable_dax() ira.weiny
2020-02-22  0:28   ` Darrick J. Wong
2020-02-23 15:07     ` Ira Weiny
2020-02-21  0:41 ` [PATCH V4 07/13] fs: Add locking for a dynamic address space operations state ira.weiny
2020-02-21 17:44   ` Christoph Hellwig
2020-02-21 22:44     ` Dave Chinner
2020-02-21 23:26       ` Dan Williams
2020-02-24 17:56       ` Christoph Hellwig
2020-02-25  0:09         ` Dave Chinner
2020-02-25 17:36           ` Christoph Hellwig [this message]
2020-02-25 19:37             ` Jeff Moyer
2020-02-26  9:28               ` Jonathan Halliday
2020-02-26 11:31                 ` Jan Kara
2020-02-26 11:56                   ` Jonathan Halliday
2020-02-26 16:10                 ` Ira Weiny
2020-02-26 16:46                 ` Dan Williams
2020-02-26 17:20                   ` Jan Kara
2020-02-26 17:54                     ` Dan Williams
2020-02-25 21:03             ` Ira Weiny
2020-02-26 11:17           ` Jan Kara
2020-02-26 15:57             ` Ira Weiny
2020-02-22  0:33   ` Darrick J. Wong
2020-02-23 15:03     ` Ira Weiny
2020-02-21  0:41 ` [PATCH V4 08/13] fs: Prevent DAX state change if file is mmap'ed ira.weiny
2020-02-21  0:41 ` [PATCH V4 09/13] fs/xfs: Add write aops lock to xfs layer ira.weiny
2020-02-22  0:31   ` Darrick J. Wong
2020-02-23 15:04     ` Ira Weiny
2020-02-24  0:34   ` Dave Chinner
2020-02-24 19:57     ` Ira Weiny
2020-02-24 22:32       ` Dave Chinner
2020-02-25 21:12         ` Ira Weiny
2020-02-25 22:59           ` Dave Chinner
2020-02-26 18:02             ` Ira Weiny
2020-02-21  0:41 ` [PATCH V4 10/13] fs/xfs: Clean up locking in dax invalidate ira.weiny
2020-02-21 17:45   ` Christoph Hellwig
2020-02-21 18:06     ` Ira Weiny
2020-02-21  0:41 ` [PATCH V4 11/13] fs/xfs: Allow toggle of effective DAX flag ira.weiny
2020-02-21  0:41 ` [PATCH V4 12/13] fs/xfs: Remove xfs_diflags_to_linux() ira.weiny
2020-02-21  0:41 ` [PATCH V4 13/13] Documentation/dax: Update Usage section ira.weiny
2020-02-26 22:48 ` [PATCH V4 00/13] Enable per-file/per-directory DAX operations V4 Jeff Moyer
2020-02-27  2:43   ` Ira Weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200225173633.GA30843@lst.de \
    --to=hch@lst.de \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).