From: Dave Chinner <david@fromorbit.com> To: Boaz Harrosh <boaz@plexistor.com> Cc: ira.weiny@intel.com, linux-kernel@vger.kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>, "Darrick J. Wong" <darrick.wong@oracle.com>, Dan Williams <dan.j.williams@intel.com>, Christoph Hellwig <hch@lst.de>, "Theodore Y. Ts'o" <tytso@mit.edu>, Jan Kara <jack@suse.cz>, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 0/5] Enable per-file/directory DAX operations Date: Fri, 25 Oct 2019 11:36:03 +1100 [thread overview] Message-ID: <20191025003603.GE4614@dread.disaster.area> (raw) In-Reply-To: <ab101f90-6ec1-7527-1859-5f6309640cfa@plexistor.com> On Fri, Oct 25, 2019 at 02:29:04AM +0300, Boaz Harrosh wrote: > On 25/10/2019 00:35, Dave Chinner wrote: > > On Thu, Oct 24, 2019 at 05:05:45PM +0300, Boaz Harrosh wrote: > > This isn't a theoretical problem - this is exactly the race > > condition that lead us to disabling the flag in the first place. > > There is no serialisation between the read and write parts of the > > page fault iand the filesystem changing the DAX flag and ops vector, > > and so fixing this problem requires hold yet more locks in the > > filesystem path to completely lock out page fault processing on the > > inode's mapping. > > > > Again sorry that I do not explain very good. > > Already on the read fault we populate the xarray, On a write fault we can have an empty xarray slot so the write fault needs to both populate the xarray slot (read fault) and process the write fault. > My point was that if I want to set the DAX mode I must enforce that > there are no other parallel users on my inode. The check that the > xarray is empty is my convoluted way to check that there are no other > users except me. If xarray is not empty I bail out with EBUISY Checking the xarray being empty is racy. The moment you drop the mapping lock, the page fault can populate a slot in the mapping that you just checked was empty. And then you swap the aops between the population and the ->page-mkwrite() call in the page fault that is running, and things go boom. Unless there's something new in the page fault path that nobody has noticed in the past couple of years, this TOCTOU race hasn't been solved.... > Perhaps we always go by the directory. And then do an mv dir_DAX/foo dir_NODAX/foo The inode is instatiated before the rename is run, so it's set up with it's old dir config, not the new one. So this ends up with the same problem of haivng to change the S_DAX flag and aops vector dynamically on rename. Same problem, not a solution. > to have an effective change. In hard links the first one at iget time before populating > the inode cache takes affect. If something like a find or backup program brings the inode into cache, the app may not even get the behaviour it wants, and it can't change it until the inode is evicted from cache, which may be never. Nobody wants implicit/random/uncontrollable/unchangeable behaviour like this. > (And never change the flag on the fly) > (Just brain storming here) We went over all this ground when we disabled the flag in the first place. We disabled the flag because we couldn't come up with a sane way to flip the ops vector short of tracking the number of aops calls in progress at any given time. i.e. reference counting the aops structure, but that's hard to do with a const ops structure, and so it got disabled rather than allowing users to crash kernels.... Cheers, -Dave. -- Dave Chinner david@fromorbit.com
next prev parent reply other threads:[~2019-10-25 0:36 UTC|newest] Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-10-20 15:59 ira.weiny 2019-10-20 15:59 ` [PATCH 1/5] fs/stat: Define DAX statx attribute ira.weiny 2019-10-22 11:32 ` Boaz Harrosh 2019-10-22 16:51 ` Ira Weiny 2019-10-20 15:59 ` [PATCH 2/5] fs/xfs: Isolate the physical DAX flag from effective ira.weiny 2019-10-21 0:26 ` Dave Chinner 2019-10-21 17:40 ` Ira Weiny 2019-10-20 15:59 ` [PATCH 3/5] fs/xfs: Separate functionality of xfs_inode_supports_dax() ira.weiny 2019-10-20 15:59 ` [PATCH 4/5] fs/xfs: Clean up DAX support check ira.weiny 2019-10-20 15:59 ` [PATCH 5/5] fs/xfs: Allow toggle of physical DAX flag ira.weiny 2019-10-21 0:45 ` Dave Chinner 2019-10-21 22:49 ` Ira Weiny 2019-10-21 23:46 ` Dave Chinner 2019-11-08 13:12 ` Jan Kara 2019-11-08 13:46 ` Jan Kara 2019-11-08 19:36 ` Ira Weiny 2019-11-11 16:07 ` Jan Kara 2019-11-11 23:54 ` Ira Weiny 2019-10-22 11:21 ` [PATCH 0/5] Enable per-file/directory DAX operations Boaz Harrosh 2019-10-23 13:09 ` Boaz Harrosh 2019-10-23 22:13 ` Dave Chinner 2019-10-24 2:31 ` Boaz Harrosh 2019-10-24 7:34 ` Dave Chinner 2019-10-24 14:05 ` Boaz Harrosh 2019-10-24 21:35 ` Dave Chinner 2019-10-24 23:29 ` Boaz Harrosh 2019-10-25 0:36 ` Dave Chinner [this message] 2019-10-25 1:15 ` Boaz Harrosh 2019-10-25 20:49 ` Ira Weiny 2019-10-27 22:10 ` Dave Chinner 2019-10-31 16:17 ` Ira Weiny 2019-11-01 22:47 ` Dave Chinner 2019-11-02 4:25 ` Dan Williams
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20191025003603.GE4614@dread.disaster.area \ --to=david@fromorbit.com \ --cc=boaz@plexistor.com \ --cc=dan.j.williams@intel.com \ --cc=darrick.wong@oracle.com \ --cc=hch@lst.de \ --cc=ira.weiny@intel.com \ --cc=jack@suse.cz \ --cc=linux-ext4@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-xfs@vger.kernel.org \ --cc=tytso@mit.edu \ --cc=viro@zeniv.linux.org.uk \ --subject='Re: [PATCH 0/5] Enable per-file/directory DAX operations' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.