linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Boaz Harrosh <boaz@plexistor.com>
Cc: ira.weiny@intel.com, linux-kernel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Christoph Hellwig <hch@lst.de>,
	"Theodore Y. Ts'o" <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 0/5] Enable per-file/directory DAX operations
Date: Fri, 25 Oct 2019 08:35:08 +1100	[thread overview]
Message-ID: <20191024213508.GB4614@dread.disaster.area> (raw)
In-Reply-To: <fb4f8be7-bca6-733a-7f16-ced6557f7108@plexistor.com>

On Thu, Oct 24, 2019 at 05:05:45PM +0300, Boaz Harrosh wrote:
> On 24/10/2019 10:34, Dave Chinner wrote:
> > On Thu, Oct 24, 2019 at 05:31:13AM +0300, Boaz Harrosh wrote:
> <>
> > 
> > The on disk DAX flag is inherited from the parent directory at
> > create time. Hence an admin only need to set it on the data
> > directory of the application when first configuring it, and
> > everything the app creates will be configured for DAX access
> > automatically.
> > 
> 
> Yes I said that as well.

You said "it must be set between creation and first write",
stating the requirement for an on-disk flag to work. I'm
decribing how that requirement is actually implemented. i.e. what
you are stating is something we actually implemented years ago...

> > I also seem
> > to recall that there was a need to take some vm level lock to really
> > prevent page fault races, and that we can't safely take that in a
> > safe combination with all the filesystem locks we need.
> > 
> 
> We do not really care with page fault races in the Kernel as long

Oh yes we do. A write fault is a 2-part operation - a read fault to
populate the pte and mapping, then a write fault (->page_mkwrite) to 
do all the filesystem work needed to dirty the page and pte.

The read fault sets up the state for the write fault, and if we
change the aops between these two operations, then the
->page_mkwrite implementation goes kaboom.

This isn't a theoretical problem - this is exactly the race
condition that lead us to disabling the flag in the first place.
There is no serialisation between the read and write parts of the
page fault iand the filesystem changing the DAX flag and ops vector,
and so fixing this problem requires hold yet more locks in the
filesystem path to completely lock out page fault processing on the
inode's mapping.

> as I protect the xarray access and these are protected well if we
> take truncate locking. But we have a bigger problem that you pointed
> out with the change of the operations vector pointer.
> 
> I was thinking about this last night. One way to do this is with
> file-exclusive-lock. Correct me if I'm wrong:
> file-exclusive-readwrite-lock means any other openers will fail and
> if there are openers already the lock will fail. Which is what we want
> no?

The filesystem ioctls and page faults have no visibility of file
locks.  They don't know and can't find out in a sane manner that an
inode has a single -user- reference.

And it introduces a new problem for any application using the
fssetxattr() ioctl - accidentally not setting the S_DAX flag to be
unmodified will now fail, and that means such a change breaks
existing applications. Sure, you can say they are "buggy
applications", but the fact is this user API change breaks them.

Hence I don't think we can change the user API for setting/clearing
this flag like this.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2019-10-24 21:35 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-20 15:59 [PATCH 0/5] Enable per-file/directory DAX operations ira.weiny
2019-10-20 15:59 ` [PATCH 1/5] fs/stat: Define DAX statx attribute ira.weiny
2019-10-22 11:32   ` Boaz Harrosh
2019-10-22 16:51     ` Ira Weiny
2019-10-20 15:59 ` [PATCH 2/5] fs/xfs: Isolate the physical DAX flag from effective ira.weiny
2019-10-21  0:26   ` Dave Chinner
2019-10-21 17:40     ` Ira Weiny
2019-10-20 15:59 ` [PATCH 3/5] fs/xfs: Separate functionality of xfs_inode_supports_dax() ira.weiny
2019-10-20 15:59 ` [PATCH 4/5] fs/xfs: Clean up DAX support check ira.weiny
2019-10-20 15:59 ` [PATCH 5/5] fs/xfs: Allow toggle of physical DAX flag ira.weiny
2019-10-21  0:45   ` Dave Chinner
2019-10-21 22:49     ` Ira Weiny
2019-10-21 23:46       ` Dave Chinner
2019-11-08 13:12       ` Jan Kara
2019-11-08 13:46         ` Jan Kara
2019-11-08 19:36           ` Ira Weiny
2019-11-11 16:07             ` Jan Kara
2019-11-11 23:54               ` Ira Weiny
2019-10-22 11:21 ` [PATCH 0/5] Enable per-file/directory DAX operations Boaz Harrosh
2019-10-23 13:09   ` Boaz Harrosh
2019-10-23 22:13     ` Dave Chinner
2019-10-24  2:31       ` Boaz Harrosh
2019-10-24  7:34         ` Dave Chinner
2019-10-24 14:05           ` Boaz Harrosh
2019-10-24 21:35             ` Dave Chinner [this message]
2019-10-24 23:29               ` Boaz Harrosh
2019-10-25  0:36                 ` Dave Chinner
2019-10-25  1:15                   ` Boaz Harrosh
2019-10-25 20:49                   ` Ira Weiny
2019-10-27 22:10                     ` Dave Chinner
2019-10-31 16:17                       ` Ira Weiny
2019-11-01 22:47                         ` Dave Chinner
2019-11-02  4:25                           ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191024213508.GB4614@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=boaz@plexistor.com \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@lst.de \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).