From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@lst.de>,
Dave Chinner <david@fromorbit.com>,
"Theodore Y. Ts'o" <tytso@mit.edu>,
Dan Williams <dan.j.williams@intel.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
linux-ext4 <linux-ext4@vger.kernel.org>,
linux-xfs <linux-xfs@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH V5 00/12] Enable per-file/per-directory DAX operations V5
Date: Fri, 3 Apr 2020 11:37:46 -0700 [thread overview]
Message-ID: <20200403183746.GQ80283@magnolia> (raw)
In-Reply-To: <20200403181843.GK3952565@iweiny-DESK2.sc.intel.com>
On Fri, Apr 03, 2020 at 11:18:43AM -0700, Ira Weiny wrote:
> On Fri, Apr 03, 2020 at 07:03:38PM +0200, Jan Kara wrote:
> > On Fri 03-04-20 08:48:29, Ira Weiny wrote:
> > > On Fri, Apr 03, 2020 at 09:27:31AM +0200, Christoph Hellwig wrote:
> > > > On Thu, Apr 02, 2020 at 01:55:19PM -0700, Ira Weiny wrote:
> > > > > > I'd just return an error for that case, don't play silly games like
> > > > > > evicting the inode.
> > > > >
> > > > > I think I agree with Christoph here. But I want to clarify. I was heading in
> > > > > a direction of failing the ioctl completely. But we could have the flag change
> > > > > with an appropriate error which could let the user know the change has been
> > > > > delayed.
> > > > >
> > > > > But I don't immediately see what error code is appropriate for such an
> > > > > indication. Candidates I can envision:
> > > > >
> > > > > EAGAIN
> > > > > ERESTART
> > > > > EUSERS
> > > > > EINPROGRESS
> > > > >
> > > > > None are perfect but I'm leaning toward EINPROGRESS.
> > > >
> > > > I really, really dislike that idea. The whole point of not forcing
> > > > evictions is to make it clear - no this inode is "busy" you can't
> > > > do that. A reasonably smart application can try to evict itself.
> > >
> > > I don't understand. What Darrick proposed would never need any
> > > evictions. If the file has blocks allocated the FS_XFLAG_DAX flag can
> > > not be changed. So I don't see what good eviction would do at all.
> >
> > I guess there's some confusion here (may well be than on my side). Darrick
> > propose that we can switch FS_XFLAG_DAX only when file has no blocks
> > allocated - fine by me. But that still does not mean than we can switch
> > S_DAX immediately, does it? Because that would still mean we need to switch
> > aops on living inode and that's ... difficult and Christoph didn't want to
> > clutter the code with it.
> >
> > So I've understood Darrick's proposal as: Just switch FS_XFLAG_DAX flag,
> > S_DAX flag will magically switch when inode gets evicted and the inode gets
> > reloaded from the disk again. Did I misunderstand anything?
> >
> > And my thinking was that this is surprising behavior for the user and so it
> > will likely generate lots of bug reports along the lines of "DAX inode flag
> > does not work!". So I was pondering how to make the behavior less
> > confusing... The ioctl I've suggested was just a poor attempt at that.
>
> Ok but then I don't understand Christophs comment to "just return an error for
> that case"? Which case?
>
> >
> > > > But returning an error and doing a lazy change anyway is straight from
> > > > the playbook for arcane and confusing API designs.
> > >
> > > Jan countered with a proposal that the FS_XFLAG_DAX does change with
> > > blocks allocated. But that S_DAX would change on eviction. Adding that
> > > some eviction ioctl could be added.
> >
> > No, I didn't mean that we can change FS_XFLAG_DAX with blocks allocated. I
> > was still speaking about the case without blocks allocated.
>
> Ah ok good point. But again what 'error' do we return when FS_XFLAG_DAX
> changed but S_DAX did not?
>
> >
> > > You then proposed just returning an error for that case. (This lead me to
> > > believe that you were ok with an eviction based change of S_DAX.)
> > >
> > > So I agreed that changing S_DAX could be delayed until an explicit eviction.
> > > But, to aid the 'smart application', a different error code could be used to
> > > indicate that the FS_XFLAG_DAX had been changed but that until that explicit
> > > eviction occurs S_DAX would remain.
> > >
> > > So I don't fully follow what you mean by 'lazy change'?
> > >
> > > Do you still really, really dislike an explicit eviction method for changing
> > > the S_DAX flag?
> > >
> > > If FS_XFLAG_DAX can never be changed on a file with blocks allocated and the
> > > user wants to change the mode of operations on their 'data'; they would have to
> > > create a new file with the proper setting and move the data there. For example
> > > copy the file into a directory marked FS_XFLAG_DAX==true?
> > >
> > > I'm ok with either interface as I think both could be clear if documented.
> >
> > I agree that what Darrick suggested is technically easily doable and can be
> > documented. But it is not natural behavior (i.e., different than all inode
> > flags we have) and we know how careful people are when reading
> > documentation...
> >
>
> Ok For 5.8 why don't we not allow FS_XFLAG_DAX to be changed on files _at_
> _all_...
>
> In summary:
>
> - Applications must call statx to discover the current S_DAX state.
Ok.
> - There exists an advisory file inode flag FS_XFLAG_DAX that is set based on
> the parent directory FS_XFLAG_DAX inode flag. (There is no way to change
> this flag after file creation.)
>
> If FS_XFLAG_DAX is set and the fs is on pmem then it will enable S_DAX at
> inode load time; if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> Unless overridden...
Ok, fine with me. :)
> - There exists a dax= mount option.
>
> "-o dax=off" means "never set S_DAX, ignore FS_XFLAG_DAX"
> "-o nodax" means "dax=off"
I surveyed the three fses that support dax and found that none of the
filesystems actually have a 'nodax' flag. Now would be the time not to
add such a thing, and make people specify dax=off instead. It would
be handy if we could have a single fsparam_enum for figuring out the dax
mount options.
> "-o dax=always" means "always set S_DAX (at least on pmem), ignore FS_XFLAG_DAX"
> "-o dax" by itself means "dax=always"
> "-o dax=iflag" means "follow FS_XFLAG_DAX" and is the default
>
> - There exists an advisory directory inode flag FS_XFLAG_DAX that can be
> changed at any time. The flag state is copied into any files or
> subdirectories when they are created within that directory. If programs
> require file access runs in S_DAX mode, they'll have to create those files
"...they must create..."
> inside a directory with FS_XFLAG_DAX set, or mount the fs with an
> appropriate dax mount option.
Otherwise seems ok to me.
--D
>
>
> ???
>
> Ira
>
next prev parent reply other threads:[~2020-04-03 18:40 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-27 5:24 [PATCH V5 00/12] Enable per-file/per-directory DAX operations V5 ira.weiny
2020-02-27 5:24 ` [PATCH V5 01/12] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
2020-02-27 17:25 ` Ira Weiny
2020-02-27 5:24 ` [PATCH V5 02/12] fs: Remove unneeded IS_DAX() check ira.weiny
2020-02-27 5:24 ` [PATCH V5 03/12] fs/stat: Define DAX statx attribute ira.weiny
2020-02-27 5:24 ` [PATCH V5 04/12] fs/xfs: Isolate the physical DAX flag from enabled ira.weiny
2020-02-27 5:24 ` [PATCH V5 05/12] fs/xfs: Create function xfs_inode_enable_dax() ira.weiny
2020-03-01 22:37 ` Dave Chinner
2020-02-27 5:24 ` [PATCH V5 06/12] fs: Add locking for a dynamic address space operations state ira.weiny
2020-03-02 1:26 ` Dave Chinner
2020-03-02 1:36 ` Dave Chinner
2020-02-27 5:24 ` [PATCH V5 07/12] fs: Prevent DAX state change if file is mmap'ed ira.weiny
2020-02-27 5:24 ` [PATCH V5 08/12] fs/xfs: Hold off aops users while changing DAX state ira.weiny
2020-02-27 5:24 ` [PATCH V5 09/12] fs/xfs: Clean up locking in dax invalidate ira.weiny
2020-02-27 5:24 ` [PATCH V5 10/12] fs/xfs: Allow toggle of effective DAX flag ira.weiny
2020-02-27 5:24 ` [PATCH V5 11/12] fs/xfs: Remove xfs_diflags_to_linux() ira.weiny
2020-02-27 5:24 ` [PATCH V5 12/12] Documentation/dax: Update Usage section ira.weiny
2020-03-05 15:51 ` [PATCH V5 00/12] Enable per-file/per-directory DAX operations V5 Christoph Hellwig
2020-03-09 17:04 ` Ira Weiny
2020-03-11 3:36 ` Darrick J. Wong
2020-03-11 6:29 ` Christoph Hellwig
2020-03-11 17:07 ` Dan Williams
2020-03-16 9:52 ` Jan Kara
2020-03-16 9:55 ` Christoph Hellwig
2020-04-01 4:00 ` Darrick J. Wong
2020-04-01 10:25 ` Jan Kara
2020-04-02 8:53 ` Christoph Hellwig
2020-04-02 20:55 ` Ira Weiny
2020-04-03 7:27 ` Christoph Hellwig
2020-04-03 15:48 ` Ira Weiny
2020-04-03 17:03 ` Jan Kara
2020-04-03 18:18 ` Ira Weiny
2020-04-03 18:21 ` Ira Weiny
2020-04-03 18:37 ` Darrick J. Wong [this message]
2020-04-05 6:19 ` Ira Weiny
2020-04-06 10:00 ` Jan Kara
2020-04-03 18:29 ` Darrick J. Wong
2020-04-03 16:05 ` Darrick J. Wong
2020-04-03 4:39 ` Ira Weiny
2020-03-11 6:39 ` Dave Chinner
2020-03-11 6:44 ` Christoph Hellwig
2020-03-11 17:07 ` Dan Williams
2020-03-12 0:49 ` Dave Chinner
2020-03-12 3:00 ` Darrick J. Wong
2020-03-12 7:27 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200403183746.GQ80283@magnolia \
--to=darrick.wong@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=ira.weiny@intel.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).