Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@lst.de>,
	Dave Chinner <david@fromorbit.com>,
	"Theodore Y. Ts'o" <tytso@mit.edu>,
	Dan Williams <dan.j.williams@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH V5 00/12] Enable per-file/per-directory DAX operations V5
Date: Fri, 3 Apr 2020 11:37:46 -0700
Message-ID: <20200403183746.GQ80283@magnolia> (raw)
In-Reply-To: <20200403181843.GK3952565@iweiny-DESK2.sc.intel.com>

On Fri, Apr 03, 2020 at 11:18:43AM -0700, Ira Weiny wrote:
> On Fri, Apr 03, 2020 at 07:03:38PM +0200, Jan Kara wrote:
> > On Fri 03-04-20 08:48:29, Ira Weiny wrote:
> > > On Fri, Apr 03, 2020 at 09:27:31AM +0200, Christoph Hellwig wrote:
> > > > On Thu, Apr 02, 2020 at 01:55:19PM -0700, Ira Weiny wrote:
> > > > > > I'd just return an error for that case, don't play silly games like
> > > > > > evicting the inode.
> > > > > 
> > > > > I think I agree with Christoph here.  But I want to clarify.  I was heading in
> > > > > a direction of failing the ioctl completely.  But we could have the flag change
> > > > > with an appropriate error which could let the user know the change has been
> > > > > delayed.
> > > > > 
> > > > > But I don't immediately see what error code is appropriate for such an
> > > > > indication.  Candidates I can envision:
> > > > > 
> > > > > EAGAIN
> > > > > ERESTART
> > > > > EUSERS
> > > > > EINPROGRESS
> > > > > 
> > > > > None are perfect but I'm leaning toward EINPROGRESS.
> > > > 
> > > > I really, really dislike that idea.  The whole point of not forcing
> > > > evictions is to make it clear - no this inode is "busy" you can't
> > > > do that.  A reasonably smart application can try to evict itself.
> > > 
> > > I don't understand.  What Darrick proposed would never need any
> > > evictions.  If the file has blocks allocated the FS_XFLAG_DAX flag can
> > > not be changed.  So I don't see what good eviction would do at all.
> > 
> > I guess there's some confusion here (may well be than on my side). Darrick
> > propose that we can switch FS_XFLAG_DAX only when file has no blocks
> > allocated - fine by me. But that still does not mean than we can switch
> > S_DAX immediately, does it? Because that would still mean we need to switch
> > aops on living inode and that's ... difficult and Christoph didn't want to
> > clutter the code with it.
> > 
> > So I've understood Darrick's proposal as: Just switch FS_XFLAG_DAX flag,
> > S_DAX flag will magically switch when inode gets evicted and the inode gets
> > reloaded from the disk again. Did I misunderstand anything?
> > 
> > And my thinking was that this is surprising behavior for the user and so it
> > will likely generate lots of bug reports along the lines of "DAX inode flag
> > does not work!". So I was pondering how to make the behavior less
> > confusing... The ioctl I've suggested was just a poor attempt at that.
> 
> Ok but then I don't understand Christophs comment to "just return an error for
> that case"?  Which case?
> 
> > 
> > > > But returning an error and doing a lazy change anyway is straight from
> > > > the playbook for arcane and confusing API designs.
> > > 
> > > Jan countered with a proposal that the FS_XFLAG_DAX does change with
> > > blocks allocated.  But that S_DAX would change on eviction.  Adding that
> > > some eviction ioctl could be added.
> > 
> > No, I didn't mean that we can change FS_XFLAG_DAX with blocks allocated. I
> > was still speaking about the case without blocks allocated.
> 
> Ah ok good point.  But again what 'error' do we return when FS_XFLAG_DAX
> changed but S_DAX did not?
> 
> > 
> > > You then proposed just returning an error for that case.  (This lead me to
> > > believe that you were ok with an eviction based change of S_DAX.)
> > > 
> > > So I agreed that changing S_DAX could be delayed until an explicit eviction.
> > > But, to aid the 'smart application', a different error code could be used to
> > > indicate that the FS_XFLAG_DAX had been changed but that until that explicit
> > > eviction occurs S_DAX would remain.
> > > 
> > > So I don't fully follow what you mean by 'lazy change'?
> > > 
> > > Do you still really, really dislike an explicit eviction method for changing
> > > the S_DAX flag?
> > > 
> > > If FS_XFLAG_DAX can never be changed on a file with blocks allocated and the
> > > user wants to change the mode of operations on their 'data'; they would have to
> > > create a new file with the proper setting and move the data there.  For example
> > > copy the file into a directory marked FS_XFLAG_DAX==true?
> > > 
> > > I'm ok with either interface as I think both could be clear if documented.
> > 
> > I agree that what Darrick suggested is technically easily doable and can be
> > documented. But it is not natural behavior (i.e., different than all inode
> > flags we have) and we know how careful people are when reading
> > documentation...
> > 
> 
> Ok For 5.8 why don't we not allow FS_XFLAG_DAX to be changed on files _at_
> _all_...
> 
> In summary:
> 
>  - Applications must call statx to discover the current S_DAX state.

Ok.

>  - There exists an advisory file inode flag FS_XFLAG_DAX that is set based on
>    the parent directory FS_XFLAG_DAX inode flag.  (There is no way to change
>    this flag after file creation.)
> 
>    If FS_XFLAG_DAX is set and the fs is on pmem then it will enable S_DAX at
>    inode load time; if FS_XFLAG_DAX is not set, it will not enable S_DAX.
>    Unless overridden...

Ok, fine with me. :)

>  - There exists a dax= mount option.
> 
>    "-o dax=off" means "never set S_DAX, ignore FS_XFLAG_DAX"
>    	"-o nodax" means "dax=off"

I surveyed the three fses that support dax and found that none of the
filesystems actually have a 'nodax' flag.  Now would be the time not to
add such a thing, and make people specify dax=off instead.  It would
be handy if we could have a single fsparam_enum for figuring out the dax
mount options.

>    "-o dax=always" means "always set S_DAX (at least on pmem), ignore FS_XFLAG_DAX"
>    	"-o dax" by itself means "dax=always"
>    "-o dax=iflag" means "follow FS_XFLAG_DAX" and is the default
> 
>  - There exists an advisory directory inode flag FS_XFLAG_DAX that can be
>    changed at any time.  The flag state is copied into any files or
>    subdirectories when they are created within that directory.  If programs
>    require file access runs in S_DAX mode, they'll have to create those files

"...they must create..."

>    inside a directory with FS_XFLAG_DAX set, or mount the fs with an
>    appropriate dax mount option.

Otherwise seems ok to me.

--D

> 
> 
> ???
> 
> Ira
> 

  parent reply index

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-27  5:24 ira.weiny
2020-02-27  5:24 ` [PATCH V5 01/12] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
2020-02-27 17:25   ` Ira Weiny
2020-02-27  5:24 ` [PATCH V5 02/12] fs: Remove unneeded IS_DAX() check ira.weiny
2020-02-27  5:24 ` [PATCH V5 03/12] fs/stat: Define DAX statx attribute ira.weiny
2020-02-27  5:24 ` [PATCH V5 04/12] fs/xfs: Isolate the physical DAX flag from enabled ira.weiny
2020-02-27  5:24 ` [PATCH V5 05/12] fs/xfs: Create function xfs_inode_enable_dax() ira.weiny
2020-03-01 22:37   ` Dave Chinner
2020-02-27  5:24 ` [PATCH V5 06/12] fs: Add locking for a dynamic address space operations state ira.weiny
2020-03-02  1:26   ` Dave Chinner
2020-03-02  1:36     ` Dave Chinner
2020-02-27  5:24 ` [PATCH V5 07/12] fs: Prevent DAX state change if file is mmap'ed ira.weiny
2020-02-27  5:24 ` [PATCH V5 08/12] fs/xfs: Hold off aops users while changing DAX state ira.weiny
2020-02-27  5:24 ` [PATCH V5 09/12] fs/xfs: Clean up locking in dax invalidate ira.weiny
2020-02-27  5:24 ` [PATCH V5 10/12] fs/xfs: Allow toggle of effective DAX flag ira.weiny
2020-02-27  5:24 ` [PATCH V5 11/12] fs/xfs: Remove xfs_diflags_to_linux() ira.weiny
2020-02-27  5:24 ` [PATCH V5 12/12] Documentation/dax: Update Usage section ira.weiny
2020-03-05 15:51 ` [PATCH V5 00/12] Enable per-file/per-directory DAX operations V5 Christoph Hellwig
2020-03-09 17:04   ` Ira Weiny
2020-03-11  3:36     ` Darrick J. Wong
2020-03-11  6:29       ` Christoph Hellwig
2020-03-11 17:07         ` Dan Williams
2020-03-16  9:52           ` Jan Kara
2020-03-16  9:55             ` Christoph Hellwig
2020-04-01  4:00               ` Darrick J. Wong
2020-04-01 10:25                 ` Jan Kara
2020-04-02  8:53                   ` Christoph Hellwig
2020-04-02 20:55                     ` Ira Weiny
2020-04-03  7:27                       ` Christoph Hellwig
2020-04-03 15:48                         ` Ira Weiny
2020-04-03 17:03                           ` Jan Kara
2020-04-03 18:18                             ` Ira Weiny
2020-04-03 18:21                               ` Ira Weiny
2020-04-03 18:37                               ` Darrick J. Wong [this message]
2020-04-05  6:19                                 ` Ira Weiny
2020-04-06 10:00                               ` Jan Kara
2020-04-03 18:29                             ` Darrick J. Wong
2020-04-03 16:05                         ` Darrick J. Wong
2020-04-03  4:39                 ` Ira Weiny
2020-03-11  6:39       ` Dave Chinner
2020-03-11  6:44         ` Christoph Hellwig
2020-03-11 17:07           ` Dan Williams
2020-03-12  0:49           ` Dave Chinner
2020-03-12  3:00             ` Darrick J. Wong
2020-03-12  7:27             ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200403183746.GQ80283@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git