LKML Archive on lore.kernel.org
 help / color / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Ira Weiny <ira.weiny@intel.com>
Cc: Dave Chinner <david@fromorbit.com>,
	linux-kernel@vger.kernel.org,
	Dan Williams <dan.j.williams@intel.com>,
	Christoph Hellwig <hch@lst.de>,
	"Theodore Y. Ts'o" <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
	Jeff Moyer <jmoyer@redhat.com>,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
Date: Thu, 9 Apr 2020 09:59:27 -0700
Message-ID: <20200409165927.GD6741@magnolia> (raw)
In-Reply-To: <20200409152944.GA801705@iweiny-DESK2.sc.intel.com>

On Thu, Apr 09, 2020 at 08:29:44AM -0700, Ira Weiny wrote:
> On Wed, Apr 08, 2020 at 05:30:21PM -0700, Darrick J. Wong wrote:
> 
> [snip]
> 
> > 
> > But you're right, this thing keeps swirling around and around and around
> > because we can't ever get to agreement on this.  Maybe I'll just become
> > XFS BOFH MAINTAINER and make a decision like this:
> > 
> >  1 Applications must call statx to discover the current S_DAX state.
> > 
> >  2 There exists an advisory file inode flag FS_XFLAG_DAX that is set based on
> >    the parent directory FS_XFLAG_DAX inode flag.  This advisory flag can be
> >    changed after file creation, but it does not immediately affect the S_DAX
> >    state.
> > 
> >    If FS_XFLAG_DAX is set and the fs is on pmem then it will enable S_DAX at
> >    inode load time; if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> >    Unless overridden...
> > 
> >  3 There exists a dax= mount option.
> > 
> >    "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
> >    "-o dax=always" means "always set S_DAX (at least on pmem), ignore FS_XFLAG_DAX"
> >         "-o dax" by itself means "dax=always"
> >    "-o dax=iflag" means "follow FS_XFLAG_DAX" and is the default
> 
> per-Dave '-o dax=inode'

Ok.

> > 
> >  4 There exists an advisory directory inode flag FS_XFLAG_DAX that can be
> >    changed at any time.  The flag state is copied into any files or
> >    subdirectories when they are created within that directory.
> 
> Good.
> 
> >    If programs
> >    require file access runs in S_DAX mode, they must create those files
> >    inside a directory with FS_XFLAG_DAX set, or mount the fs with an
> >    appropriate dax mount option.
> 
> Why do we need this to be true?  If the FS_XFLAG_DAX flag can be cleared why
> not set it and allow the S_DAX change to occur later just like clearing it?
> The logic is exactly the same.

I think I'll just delete this sentence since I started pushing all that
verbiage towards (5).

To answer your question, yes, FS_XFLAG_DAX can be set on a file at any
time, the same as it can be cleared at any time.  Sorry that was
unclear, I'll fix that for the next draft (below).

> > 
> >  5 Programs that require a specific file access mode (DAX or not DAX) must
> 
> s/must/can/

Ok.

> >    do one of the following:
> > 
> >    (a) create files in directories with the FS_XFLAG_DAX flag set as needed;
> 
> Again if we allow clearing the flag why not setting?  So this is 1 option they
> 'can' do.
> 
> > 
> >    (b) have the administrator set an override via mount option;
> > 
> >    (c) if they need to change a file's FS_XFLAG_DAX flag so that it does not
> >        match the S_DAX state (as reported by statx), they must cause the
> >        kernel to evict the inode from memory.  This can be done by:
> > 
> >        i>   closing the file;
> >        ii>  re-opening the file and using statx to see if the fs has
> >             changed the S_DAX flag;
> 
> i and ii need to be 1 step the user must follow.

Ok, I'll combine these.

> >        iii> if not, either unmount and remount the filesystem, or
> >             closing the file and using drop_caches.
> > 
> >  6 I no longer think it's too wild to require that users who want to
> >    squeeze every last bit of performance out of the particular rough and
> >    tumble bits of their storage also be exposed to the difficulties of
> >    what happens when the operating system can't totally virtualize those
> >    hardware capabilities.  Your high performance sports car is not a
> >    Toyota minivan, as it were.
> 
> I'm good with this statement.  But I think we need to clean up the verbiage for
> the documentation...  ;-)

Heh. :)

> Thanks for the summary.  I like these to get everyone on the same page.  :-D

And today:

 1. There exists an in-kernel access mode flag S_DAX that is set when
    file accesses go directly to persistent memory, bypassing the page
    cache.  Applications must call statx to discover the current S_DAX
    state (STATX_ATTR_DAX).

 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
    inherited from the parent directory FS_XFLAG_DAX inode flag at file
    creation time.  This advisory flag can be set or cleared at any
    time, but doing so does not immediately affect the S_DAX state.

    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
    and the fs is on pmem then it will enable S_DAX at inode load time;
    if FS_XFLAG_DAX is not set, it will not enable S_DAX.

 3. There exists a dax= mount option.

    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."

    "-o dax=always" means "always set S_DAX (at least on pmem),
                    and ignore FS_XFLAG_DAX."

    "-o dax"        is an alias for "dax=always".

    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.

 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
    be set or cleared at any time.  The flag state is copied into any
    files or subdirectories when they are created within that directory.

 5. Programs that require a specific file access mode (DAX or not DAX)
    can do one of the following:

    (a) Create files in directories that the FS_XFLAG_DAX flag set as
        needed; or

    (b) Have the administrator set an override via mount option; or

    (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
        must then cause the kernel to evict the inode from memory.  This
        can be done by:

        i>  Closing the file and re-opening the file and using statx to
            see if the fs has changed the S_DAX flag; and

        ii> If the file still does not have the desired S_DAX access
            mode, either unmount and remount the filesystem, or close
            the file and use drop_caches.

 6. It's not unreasonable that users who want to squeeze every last bit
    of performance out of the particular rough and tumble bits of their
    storage also be exposed to the difficulties of what happens when the
    operating system can't totally virtualize those hardware
    capabilities.  Your high performance sports car is not a Toyota
    minivan, as it were.

Given our overnight discussions, I don't think it'll be difficult to
hoist XFS_IDONTCACHE to the VFS so that 5.c.i is enough to change the
S_DAX state if nobody else is using the file.

--D

> > 
> > > Furthermore, if we did want an interface like that why not allow
> > > the on-disk flag to be set as well as cleared?
> > 
> > Well, why not - it's why I implemented the flag in the first place!
> > The only problem we have here is how to safely change the in-memory
> > DAX state, and that largely has nothing to do with setting/clearing
> > the on-disk flag....
> 
> With the above change to xfs_diflags_to_iflags() I think we are ok here.
> 
> Ira
> 
--D

> Ira
> 
> > 
> > I think (like Dave said) that if you set XFS_IDONTCACHE on the inode
> > when you change the DAX flag, the VFS will kill the inode the instant
> > the last user close()s the file.  Then 5.c.ii will actually work.
> > 
> > --D
> > 
> > > > 
> > > > > Furthermore, if we did want an interface like that why not allow
> > > > > the on-disk flag to be set as well as cleared?
> > > > 
> > > > Well, why not - it's why I implemented the flag in the first place!
> > > > The only problem we have here is how to safely change the in-memory
> > > > DAX state, and that largely has nothing to do with setting/clearing
> > > > the on-disk flag....
> > > 
> > > With the above change to xfs_diflags_to_iflags() I think we are ok here.
> > > 
> > > Ira
> > > 

  reply index

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-07 18:29 [PATCH V6 0/8] Enable per-file/per-directory DAX operations V6 ira.weiny
2020-04-07 18:29 ` [PATCH V6 1/8] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
2020-04-07 23:46   ` Dave Chinner
2020-04-07 18:29 ` [PATCH V6 2/8] fs: Remove unneeded IS_DAX() check ira.weiny
2020-04-09  7:31   ` Christoph Hellwig
2020-04-09 14:57     ` Ira Weiny
2020-04-07 18:29 ` [PATCH V6 3/8] fs/stat: Define DAX statx attribute ira.weiny
2020-04-07 23:47   ` Dave Chinner
2020-04-07 18:29 ` [PATCH V6 4/8] fs/xfs: Make DAX mount option a tri-state ira.weiny
2020-04-07 23:59   ` Dave Chinner
2020-04-08  0:09     ` Ira Weiny
2020-04-08  0:48       ` Dave Chinner
2020-04-09 15:03         ` Ira Weiny
2020-04-07 18:29 ` [PATCH V6 5/8] fs/xfs: Create function xfs_inode_enable_dax() ira.weiny
2020-04-08  0:05   ` Dave Chinner
2020-04-08  0:13     ` Ira Weiny
2020-04-07 18:29 ` [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
2020-04-08  2:08   ` Dave Chinner
2020-04-08 17:09     ` Ira Weiny
2020-04-08 21:02       ` Dave Chinner
2020-04-08 21:28         ` Dan Williams
2020-04-08 22:10           ` Ira Weiny
2020-04-08 23:58           ` Dave Chinner
2020-04-09  0:22             ` Ira Weiny
2020-04-09 12:41               ` Christoph Hellwig
2020-04-09 20:49                 ` Ira Weiny
2020-04-08 22:07         ` Ira Weiny
2020-04-08 23:21           ` Dave Chinner
2020-04-09  0:12             ` Ira Weiny
2020-04-09  0:30               ` Darrick J. Wong
2020-04-09 15:29                 ` Ira Weiny
2020-04-09 16:59                   ` Darrick J. Wong [this message]
2020-04-09 17:17                     ` Jan Kara
2020-04-09 20:54                     ` Ira Weiny
2020-04-09  0:49               ` Dave Chinner
2020-04-09 12:40                 ` Christoph Hellwig
2020-04-10  0:27                   ` Dave Chinner
2020-04-07 18:29 ` [PATCH V6 7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check() ira.weiny
2020-04-08  2:23   ` Dave Chinner
2020-04-08  9:58     ` Jan Kara
2020-04-08 21:09       ` Dave Chinner
2020-04-08 22:26         ` Ira Weiny
2020-04-08 23:48           ` Dave Chinner
2020-04-09 12:28             ` Christoph Hellwig
2020-04-08 15:37   ` Darrick J. Wong
2020-04-08 18:13     ` Ira Weiny
2020-04-16  5:39   ` [fs/xfs] 857c9841f8: xfstests.xfs.046.fail kernel test robot
2020-04-07 18:29 ` [PATCH V6 8/8] Documentation/dax: Update Usage section ira.weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200409165927.GD6741@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jmoyer@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git