From: Dave Chinner <david@fromorbit.com> To: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: linux-nvdimm@lists.01.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org Subject: Re: [PATCH 2 2/2] xfs: fix rt_dev usage for DAX Date: Fri, 2 Feb 2018 14:36:17 +1100 [thread overview] Message-ID: <20180202033617.g23puq2xbwy4wmug@destitution> (raw) In-Reply-To: <20180202004332.GZ4849@magnolia> On Thu, Feb 01, 2018 at 04:43:32PM -0800, Darrick J. Wong wrote: > On Fri, Feb 02, 2018 at 10:44:13AM +1100, Dave Chinner wrote: > > On Thu, Feb 01, 2018 at 01:33:05PM -0700, Dave Jiang wrote: > > > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > > > Reported-by: Darrick Wong <darrick.wong@oracle.com> > > > --- > > > fs/xfs/xfs_iops.c | 3 ++- > > > fs/xfs/xfs_super.c | 9 ++++++++- > > > 2 files changed, 10 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c > > > index 56475fcd76f2..ab352c325301 100644 > > > --- a/fs/xfs/xfs_iops.c > > > +++ b/fs/xfs/xfs_iops.c > > > @@ -1204,7 +1204,8 @@ xfs_diflags_to_iflags( > > > ip->i_mount->m_sb.sb_blocksize == PAGE_SIZE && > > > !xfs_is_reflink_inode(ip) && > > > (ip->i_mount->m_flags & XFS_MOUNT_DAX || > > > - ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) > > > + ip->i_d.di_flags2 & XFS_DIFLAG2_DAX) && > > > + blk_queue_dax(bdev_get_queue(inode->i_sb->s_bdev))) > > > > This does not discriminate between the rtdev or the data dev. This > > needs to call xfs_find_bdev_for_inode() to get the right device > > for the inode config. > > > > Further, if we add or remove the RT flag to the inode at a later > > point in time (e.g. via ioctl) we also need to re-evaluate the S_DAX > > flag at that point in time. > > Ah, right, I'd missed that subtlety in my earlier replies. Ok, add > another patch to this series to reevaluate S_DAX when we change the RT > flag. > > > Which brings me to the real problem here: dynamically changing the > > S_DAX flag is racy, dangerous and broken. It's not clear that this > > should be allowed at all as the inode may have already been mmap()d > > by the time the ioctl is called to set/clear the rt file state. > > Agreed that this is a mess. Either this needs to get fixed in the dax > code, or we need to decide that we're not going to support reconfiguring > the dax flag at all, except possibly for empty files (similar to how we > restrict changes to the rt flag). Hmmm, the rt flag limitations are to do with whether extents have been allocated or not. IIRC the issue with S_DAX is whether the file has been mmap()d or not which we can't check for reliably even if the file is empty... ..... > > > if (mp->m_flags & XFS_MOUNT_DAX) { > > > + bool rtdev_is_dax = false; > > > + > > > xfs_warn(mp, > > > "DAX enabled. Warning: EXPERIMENTAL, use at your own risk"); > > > > > > + if (mp->m_rtdev_targp->bt_daxdev) > > > + if (bdev_dax_supported(mp->m_rtdev_targp->bt_bdev, > > > + sb->s_blocksize) == 0) > > > + rtdev_is_dax = true; > > > > .... as this code here needs to turn off DAX here if any device > > in the filesystem doesn't support DAX.... > > I think it'd be useful to be able to have a pmem rt device even if the > data device doesn't support it. Or rather, I have a few clients who > have expressed interest in this sort of configuration. Understood. I suspect all this means we can only support DAX on the rt device by setting the RT flag on file create, at least until the dynamic S_DAX issues are sorted out. And that also means it can't be cleared from files that have it set. So perhaps all we need to do is disallow changing the RT flag on regular files via the ioctl if XFS_MOUNT_DAX is set? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com> To: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Dave Jiang <dave.jiang@intel.com>, linux-xfs@vger.kernel.org, ross.zwisler@linux.intel.com, linux-ext4@vger.kernel.org, dan.j.williams@intel.com, linux-nvdimm@lists.01.org Subject: Re: [PATCH 2 2/2] xfs: fix rt_dev usage for DAX Date: Fri, 2 Feb 2018 14:36:17 +1100 [thread overview] Message-ID: <20180202033617.g23puq2xbwy4wmug@destitution> (raw) In-Reply-To: <20180202004332.GZ4849@magnolia> On Thu, Feb 01, 2018 at 04:43:32PM -0800, Darrick J. Wong wrote: > On Fri, Feb 02, 2018 at 10:44:13AM +1100, Dave Chinner wrote: > > On Thu, Feb 01, 2018 at 01:33:05PM -0700, Dave Jiang wrote: > > > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > > > Reported-by: Darrick Wong <darrick.wong@oracle.com> > > > --- > > > fs/xfs/xfs_iops.c | 3 ++- > > > fs/xfs/xfs_super.c | 9 ++++++++- > > > 2 files changed, 10 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c > > > index 56475fcd76f2..ab352c325301 100644 > > > --- a/fs/xfs/xfs_iops.c > > > +++ b/fs/xfs/xfs_iops.c > > > @@ -1204,7 +1204,8 @@ xfs_diflags_to_iflags( > > > ip->i_mount->m_sb.sb_blocksize == PAGE_SIZE && > > > !xfs_is_reflink_inode(ip) && > > > (ip->i_mount->m_flags & XFS_MOUNT_DAX || > > > - ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) > > > + ip->i_d.di_flags2 & XFS_DIFLAG2_DAX) && > > > + blk_queue_dax(bdev_get_queue(inode->i_sb->s_bdev))) > > > > This does not discriminate between the rtdev or the data dev. This > > needs to call xfs_find_bdev_for_inode() to get the right device > > for the inode config. > > > > Further, if we add or remove the RT flag to the inode at a later > > point in time (e.g. via ioctl) we also need to re-evaluate the S_DAX > > flag at that point in time. > > Ah, right, I'd missed that subtlety in my earlier replies. Ok, add > another patch to this series to reevaluate S_DAX when we change the RT > flag. > > > Which brings me to the real problem here: dynamically changing the > > S_DAX flag is racy, dangerous and broken. It's not clear that this > > should be allowed at all as the inode may have already been mmap()d > > by the time the ioctl is called to set/clear the rt file state. > > Agreed that this is a mess. Either this needs to get fixed in the dax > code, or we need to decide that we're not going to support reconfiguring > the dax flag at all, except possibly for empty files (similar to how we > restrict changes to the rt flag). Hmmm, the rt flag limitations are to do with whether extents have been allocated or not. IIRC the issue with S_DAX is whether the file has been mmap()d or not which we can't check for reliably even if the file is empty... ..... > > > if (mp->m_flags & XFS_MOUNT_DAX) { > > > + bool rtdev_is_dax = false; > > > + > > > xfs_warn(mp, > > > "DAX enabled. Warning: EXPERIMENTAL, use at your own risk"); > > > > > > + if (mp->m_rtdev_targp->bt_daxdev) > > > + if (bdev_dax_supported(mp->m_rtdev_targp->bt_bdev, > > > + sb->s_blocksize) == 0) > > > + rtdev_is_dax = true; > > > > .... as this code here needs to turn off DAX here if any device > > in the filesystem doesn't support DAX.... > > I think it'd be useful to be able to have a pmem rt device even if the > data device doesn't support it. Or rather, I have a few clients who > have expressed interest in this sort of configuration. Understood. I suspect all this means we can only support DAX on the rt device by setting the RT flag on file create, at least until the dynamic S_DAX issues are sorted out. And that also means it can't be cleared from files that have it set. So perhaps all we need to do is disallow changing the RT flag on regular files via the ioctl if XFS_MOUNT_DAX is set? Cheers, Dave. -- Dave Chinner david@fromorbit.com
next prev parent reply other threads:[~2018-02-02 3:35 UTC|newest] Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-02-01 20:32 [PATCH 2 1/2] dax: change bdev_dax_supported() to take a block_device as input Dave Jiang 2018-02-01 20:32 ` Dave Jiang 2018-02-01 20:32 ` Dave Jiang 2018-02-01 20:33 ` [PATCH 2 2/2] xfs: fix rt_dev usage for DAX Dave Jiang 2018-02-01 20:33 ` Dave Jiang 2018-02-01 23:28 ` Darrick J. Wong 2018-02-01 23:28 ` Darrick J. Wong 2018-02-01 23:28 ` Darrick J. Wong 2018-02-02 0:08 ` Dave Jiang 2018-02-02 0:08 ` Dave Jiang 2018-02-02 0:08 ` Dave Jiang 2018-02-02 0:38 ` Darrick J. Wong 2018-02-02 0:38 ` Darrick J. Wong 2018-02-02 0:38 ` Darrick J. Wong 2018-02-01 23:44 ` Dave Chinner 2018-02-01 23:44 ` Dave Chinner 2018-02-02 0:13 ` Dave Jiang 2018-02-02 0:13 ` Dave Jiang 2018-02-02 0:13 ` Dave Jiang 2018-02-02 3:20 ` Dave Chinner 2018-02-02 3:20 ` Dave Chinner 2018-02-02 3:20 ` Dave Chinner 2018-02-02 0:43 ` Darrick J. Wong 2018-02-02 0:43 ` Darrick J. Wong 2018-02-02 0:43 ` Darrick J. Wong 2018-02-02 3:36 ` Dave Chinner [this message] 2018-02-02 3:36 ` Dave Chinner 2018-02-06 22:32 ` Dave Jiang 2018-02-06 22:32 ` Dave Jiang 2018-02-06 22:32 ` Dave Jiang 2018-02-06 23:19 ` Darrick J. Wong 2018-02-06 23:19 ` Darrick J. Wong 2018-02-07 0:19 ` Dan Williams 2018-02-07 0:19 ` Dan Williams 2018-02-07 0:19 ` Dan Williams 2018-03-06 0:06 ` Ross Zwisler 2018-03-06 0:06 ` Ross Zwisler 2018-03-06 0:06 ` Ross Zwisler 2018-02-01 22:46 ` [PATCH 2 1/2] dax: change bdev_dax_supported() to take a block_device as input Darrick J. Wong 2018-02-01 22:46 ` Darrick J. Wong
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180202033617.g23puq2xbwy4wmug@destitution \ --to=david@fromorbit.com \ --cc=darrick.wong@oracle.com \ --cc=linux-ext4@vger.kernel.org \ --cc=linux-nvdimm@lists.01.org \ --cc=linux-xfs@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.