All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Boaz Harrosh <boaz@plexistor.com>
Cc: ira.weiny@intel.com, linux-kernel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Christoph Hellwig <hch@lst.de>,
	"Theodore Y. Ts'o" <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 0/5] Enable per-file/directory DAX operations
Date: Fri, 25 Oct 2019 11:36:03 +1100	[thread overview]
Message-ID: <20191025003603.GE4614@dread.disaster.area> (raw)
In-Reply-To: <ab101f90-6ec1-7527-1859-5f6309640cfa@plexistor.com>

On Fri, Oct 25, 2019 at 02:29:04AM +0300, Boaz Harrosh wrote:
> On 25/10/2019 00:35, Dave Chinner wrote:
> > On Thu, Oct 24, 2019 at 05:05:45PM +0300, Boaz Harrosh wrote:
> > This isn't a theoretical problem - this is exactly the race
> > condition that lead us to disabling the flag in the first place.
> > There is no serialisation between the read and write parts of the
> > page fault iand the filesystem changing the DAX flag and ops vector,
> > and so fixing this problem requires hold yet more locks in the
> > filesystem path to completely lock out page fault processing on the
> > inode's mapping.
> > 
> 
> Again sorry that I do not explain very good.
> 
> Already on the read fault we populate the xarray,

On a write fault we can have an empty xarray slot so the write fault
needs to both populate the xarray slot (read fault) and process the
write fault.

> My point was that if I want to set the DAX mode I must enforce that
> there are no other parallel users on my inode. The check that the
> xarray is empty is my convoluted way to check that there are no other
> users except me. If xarray is not empty I bail out with EBUISY

Checking the xarray being empty is racy. The moment you drop the
mapping lock, the page fault can populate a slot in the mapping that
you just checked was empty. And then you swap the aops between the
population and the ->page-mkwrite() call in the page fault
that is running, and things go boom.

Unless there's something new in the page fault path that nobody has
noticed in the past couple of years, this TOCTOU race hasn't been
solved....

> Perhaps we always go by the directory. And then do an mv dir_DAX/foo dir_NODAX/foo

The inode is instatiated before the rename is run, so it's set up
with it's old dir config, not the new one. So this ends up with the
same problem of haivng to change the S_DAX flag and aops vector
dynamically on rename. Same problem, not a solution.

> to have an effective change. In hard links the first one at iget time before populating
> the inode cache takes affect.

If something like a find or backup program brings the inode into
cache, the app may not even get the behaviour it wants, and it can't
change it until the inode is evicted from cache, which may be never.
Nobody wants implicit/random/uncontrollable/unchangeable behaviour
like this.

> (And never change the flag on the fly)
> (Just brain storming here)

We went over all this ground when we disabled the flag in the first
place. We disabled the flag because we couldn't come up with a sane
way to flip the ops vector short of tracking the number of aops
calls in progress at any given time. i.e. reference counting the
aops structure, but that's hard to do with a const ops structure,
and so it got disabled rather than allowing users to crash
kernels....

Cheers,

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2019-10-25  0:36 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-20 15:59 ira.weiny
2019-10-20 15:59 ` [PATCH 1/5] fs/stat: Define DAX statx attribute ira.weiny
2019-10-22 11:32   ` Boaz Harrosh
2019-10-22 16:51     ` Ira Weiny
2019-10-20 15:59 ` [PATCH 2/5] fs/xfs: Isolate the physical DAX flag from effective ira.weiny
2019-10-21  0:26   ` Dave Chinner
2019-10-21 17:40     ` Ira Weiny
2019-10-20 15:59 ` [PATCH 3/5] fs/xfs: Separate functionality of xfs_inode_supports_dax() ira.weiny
2019-10-20 15:59 ` [PATCH 4/5] fs/xfs: Clean up DAX support check ira.weiny
2019-10-20 15:59 ` [PATCH 5/5] fs/xfs: Allow toggle of physical DAX flag ira.weiny
2019-10-21  0:45   ` Dave Chinner
2019-10-21 22:49     ` Ira Weiny
2019-10-21 23:46       ` Dave Chinner
2019-11-08 13:12       ` Jan Kara
2019-11-08 13:46         ` Jan Kara
2019-11-08 19:36           ` Ira Weiny
2019-11-11 16:07             ` Jan Kara
2019-11-11 23:54               ` Ira Weiny
2019-10-22 11:21 ` [PATCH 0/5] Enable per-file/directory DAX operations Boaz Harrosh
2019-10-23 13:09   ` Boaz Harrosh
2019-10-23 22:13     ` Dave Chinner
2019-10-24  2:31       ` Boaz Harrosh
2019-10-24  7:34         ` Dave Chinner
2019-10-24 14:05           ` Boaz Harrosh
2019-10-24 21:35             ` Dave Chinner
2019-10-24 23:29               ` Boaz Harrosh
2019-10-25  0:36                 ` Dave Chinner [this message]
2019-10-25  1:15                   ` Boaz Harrosh
2019-10-25 20:49                   ` Ira Weiny
2019-10-27 22:10                     ` Dave Chinner
2019-10-31 16:17                       ` Ira Weiny
2019-11-01 22:47                         ` Dave Chinner
2019-11-02  4:25                           ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191025003603.GE4614@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=boaz@plexistor.com \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@lst.de \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --subject='Re: [PATCH 0/5] Enable per-file/directory DAX operations' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.