linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes
@ 2018-02-24  0:43 Dan Williams
  2018-02-24  0:43 ` [PATCH v3 1/6] dax: fix vma_is_fsdax() helper Dan Williams
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Dan Williams @ 2018-02-24  0:43 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jane Chu, Haozhong Zhang, Michal Hocko, Jan Kara, kvm,
	Darrick J. Wong, linux-kernel, stable, linux-xfs, linux-mm,
	Alex Williamson, Gerd Rausch, Alexander Viro, linux-fsdevel,
	kbuild test robot, Christoph Hellwig

Changes since v2 [1]:

* Fix yet more compile breakage in the FS_DAX=n and DEV_DAX=y case.
  (0day robot)

[1]: https://lists.01.org/pipermail/linux-nvdimm/2018-February/014046.html

---

The vfio interface, like RDMA, wants to setup long term (indefinite)
pins of the pages backing an address range so that a guest or userspace
driver can perform DMA to the with physical address. Given that this
pinning may lead to filesystem operations deadlocking in the
filesystem-dax case, the pinning request needs to be rejected.

The longer term fix for vfio, RDMA, and any other long term pin user, is
to provide a 'pin with lease' mechanism. Similar to the leases that are
hold for pNFS RDMA layouts, this userspace lease gives the kernel a way
to notify userspace that the block layout of the file is changing and
the kernel is revoking access to pinned pages.

---

Dan Williams (6):
      dax: fix vma_is_fsdax() helper
      dax: fix dax_mapping() definition in the FS_DAX=n + DEV_DAX=y case
      xfs, dax: introduce IS_FSDAX()
      dax: fix S_DAX definition
      dax: short circuit vma_is_fsdax() in the CONFIG_FS_DAX=n case
      vfio: disable filesystem-dax page pinning


 drivers/vfio/vfio_iommu_type1.c |   18 +++++++++++++++---
 fs/xfs/xfs_file.c               |   14 +++++++-------
 fs/xfs/xfs_ioctl.c              |    4 ++--
 fs/xfs/xfs_iomap.c              |    6 +++---
 fs/xfs/xfs_reflink.c            |    2 +-
 include/linux/dax.h             |    9 ++++++---
 include/linux/fs.h              |    8 ++++++--
 7 files changed, 40 insertions(+), 21 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/6] dax: fix vma_is_fsdax() helper
  2018-02-24  0:43 [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes Dan Williams
@ 2018-02-24  0:43 ` Dan Williams
  2018-02-26  9:51   ` Jan Kara
  2018-02-24  0:43 ` [PATCH v3 2/6] dax: fix dax_mapping() definition in the FS_DAX=n + DEV_DAX=y case Dan Williams
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2018-02-24  0:43 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jane Chu, Haozhong Zhang, linux-kernel, stable, linux-mm,
	Gerd Rausch, linux-fsdevel

Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use
S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on
device-dax instances when those are meant to be explicitly allowed.

Fixes: 2bb6d2837083 ("mm: introduce get_user_pages_longterm")
Cc: <stable@vger.kernel.org>
Reported-by: Gerd Rausch <gerd.rausch@oracle.com>
Acked-by: Jane Chu <jane.chu@oracle.com>
Reported-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/fs.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2a815560fda0..79c413985305 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3198,7 +3198,7 @@ static inline bool vma_is_fsdax(struct vm_area_struct *vma)
 	if (!vma_is_dax(vma))
 		return false;
 	inode = file_inode(vma->vm_file);
-	if (inode->i_mode == S_IFCHR)
+	if (S_ISCHR(inode->i_mode))
 		return false; /* device-dax */
 	return true;
 }

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/6] dax: fix dax_mapping() definition in the FS_DAX=n + DEV_DAX=y case
  2018-02-24  0:43 [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes Dan Williams
  2018-02-24  0:43 ` [PATCH v3 1/6] dax: fix vma_is_fsdax() helper Dan Williams
@ 2018-02-24  0:43 ` Dan Williams
  2018-02-24  0:43 ` [PATCH v3 3/6] xfs, dax: introduce IS_FSDAX() Dan Williams
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2018-02-24  0:43 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jan Kara, linux-kernel, stable, linux-mm, Alexander Viro,
	linux-fsdevel, kbuild test robot, Christoph Hellwig

An address_space will only have dax exceptional entries when FS_DAX is
enabled. The current reliance on S_DAX causes compile failures when
S_DAX is defined for DEV_DAX, but FS_DAX is disabled. Make dax_mapping()
always return false so that mm/truncate.c drops its link time
dependencies on fs/dax.c.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/dax.h |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0185ecdae135..62e8cf7eb566 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -107,6 +107,10 @@ int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
 int __dax_zero_page_range(struct block_device *bdev,
 		struct dax_device *dax_dev, sector_t sector,
 		unsigned int offset, unsigned int length);
+static inline bool dax_mapping(struct address_space *mapping)
+{
+	return mapping->host && IS_DAX(mapping->host);
+}
 #else
 static inline int __dax_zero_page_range(struct block_device *bdev,
 		struct dax_device *dax_dev, sector_t sector,
@@ -114,12 +118,11 @@ static inline int __dax_zero_page_range(struct block_device *bdev,
 {
 	return -ENXIO;
 }
-#endif
-
 static inline bool dax_mapping(struct address_space *mapping)
 {
-	return mapping->host && IS_DAX(mapping->host);
+	return false;
 }
+#endif
 
 struct writeback_control;
 int dax_writeback_mapping_range(struct address_space *mapping,

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 3/6] xfs, dax: introduce IS_FSDAX()
  2018-02-24  0:43 [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes Dan Williams
  2018-02-24  0:43 ` [PATCH v3 1/6] dax: fix vma_is_fsdax() helper Dan Williams
  2018-02-24  0:43 ` [PATCH v3 2/6] dax: fix dax_mapping() definition in the FS_DAX=n + DEV_DAX=y case Dan Williams
@ 2018-02-24  0:43 ` Dan Williams
  2018-02-26 10:06   ` Jan Kara
  2018-02-24  0:43 ` [PATCH v3 4/6] dax: fix S_DAX definition Dan Williams
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2018-02-24  0:43 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Darrick J. Wong, linux-kernel, stable, linux-xfs, linux-mm,
	linux-fsdevel, kbuild test robot

Given that S_DAX is non-zero in the FS_DAX=n + DEV_DAX=y case, another
mechanism besides the plain IS_DAX() check to compile out dead
filesystem-dax code paths. Without IS_FSDAX() xfs will fail at link time
with:

    ERROR: "dax_finish_sync_fault" [fs/xfs/xfs.ko] undefined!
    ERROR: "dax_iomap_fault" [fs/xfs/xfs.ko] undefined!
    ERROR: "dax_iomap_rw" [fs/xfs/xfs.ko] undefined!

This compile failure was previously hidden by the fact that S_DAX was
erroneously defined to '0' in the FS_DAX=n + DEV_DAX=y case.

Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Cc: <stable@vger.kernel.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/xfs/xfs_file.c    |   14 +++++++-------
 fs/xfs/xfs_ioctl.c   |    4 ++--
 fs/xfs/xfs_iomap.c   |    6 +++---
 fs/xfs/xfs_reflink.c |    2 +-
 include/linux/fs.h   |    2 ++
 5 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 9ea08326f876..46a098b90fd0 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -288,7 +288,7 @@ xfs_file_read_iter(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	if (IS_DAX(inode))
+	if (IS_FSDAX(inode))
 		ret = xfs_file_dax_read(iocb, to);
 	else if (iocb->ki_flags & IOCB_DIRECT)
 		ret = xfs_file_dio_aio_read(iocb, to);
@@ -726,7 +726,7 @@ xfs_file_write_iter(
 	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
 		return -EIO;
 
-	if (IS_DAX(inode))
+	if (IS_FSDAX(inode))
 		ret = xfs_file_dax_write(iocb, from);
 	else if (iocb->ki_flags & IOCB_DIRECT) {
 		/*
@@ -1045,7 +1045,7 @@ __xfs_filemap_fault(
 	}
 
 	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
-	if (IS_DAX(inode)) {
+	if (IS_FSDAX(inode)) {
 		pfn_t pfn;
 
 		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops);
@@ -1070,7 +1070,7 @@ xfs_filemap_fault(
 {
 	/* DAX can shortcut the normal fault path on write faults! */
 	return __xfs_filemap_fault(vmf, PE_SIZE_PTE,
-			IS_DAX(file_inode(vmf->vma->vm_file)) &&
+			IS_FSDAX(file_inode(vmf->vma->vm_file)) &&
 			(vmf->flags & FAULT_FLAG_WRITE));
 }
 
@@ -1079,7 +1079,7 @@ xfs_filemap_huge_fault(
 	struct vm_fault		*vmf,
 	enum page_entry_size	pe_size)
 {
-	if (!IS_DAX(file_inode(vmf->vma->vm_file)))
+	if (!IS_FSDAX(file_inode(vmf->vma->vm_file)))
 		return VM_FAULT_FALLBACK;
 
 	/* DAX can shortcut the normal fault path on write faults! */
@@ -1124,12 +1124,12 @@ xfs_file_mmap(
 	 * We don't support synchronous mappings for non-DAX files. At least
 	 * until someone comes with a sensible use case.
 	 */
-	if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
+	if (!IS_FSDAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
 		return -EOPNOTSUPP;
 
 	file_accessed(filp);
 	vma->vm_ops = &xfs_file_vm_ops;
-	if (IS_DAX(file_inode(filp)))
+	if (IS_FSDAX(file_inode(filp)))
 		vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
 	return 0;
 }
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 89fb1eb80aae..234279ff66ce 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1108,9 +1108,9 @@ xfs_ioctl_setattr_dax_invalidate(
 	}
 
 	/* If the DAX state is not changing, we have nothing to do here. */
-	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
+	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_FSDAX(inode))
 		return 0;
-	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
+	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_FSDAX(inode))
 		return 0;
 
 	/* lock, flush and invalidate mapping in preparation for flag change */
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 66e1edbfb2b2..cf794d429aec 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -241,7 +241,7 @@ xfs_iomap_write_direct(
 	 * the reserve block pool for bmbt block allocation if there is no space
 	 * left but we need to do unwritten extent conversion.
 	 */
-	if (IS_DAX(VFS_I(ip))) {
+	if (IS_FSDAX(VFS_I(ip))) {
 		bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO;
 		if (imap->br_state == XFS_EXT_UNWRITTEN) {
 			tflags |= XFS_TRANS_RESERVE;
@@ -952,7 +952,7 @@ static inline bool imap_needs_alloc(struct inode *inode,
 	return !nimaps ||
 		imap->br_startblock == HOLESTARTBLOCK ||
 		imap->br_startblock == DELAYSTARTBLOCK ||
-		(IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
+		(IS_FSDAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
 }
 
 static inline bool need_excl_ilock(struct xfs_inode *ip, unsigned flags)
@@ -988,7 +988,7 @@ xfs_file_iomap_begin(
 		return -EIO;
 
 	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
-			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
+			!IS_FSDAX(inode) && !xfs_get_extsz_hint(ip)) {
 		/* Reserve delalloc blocks for regular writeback. */
 		return xfs_file_iomap_begin_delay(inode, offset, length, iomap);
 	}
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 270246943a06..a126e00e05e3 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1351,7 +1351,7 @@ xfs_reflink_remap_range(
 		goto out_unlock;
 
 	/* Don't share DAX file data for now. */
-	if (IS_DAX(inode_in) || IS_DAX(inode_out))
+	if (IS_FSDAX(inode_in) || IS_FSDAX(inode_out))
 		goto out_unlock;
 
 	ret = vfs_clone_file_prep_inodes(inode_in, pos_in, inode_out, pos_out,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 79c413985305..a4310a95011b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1909,6 +1909,8 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags
 #define IS_WHITEOUT(inode)	(S_ISCHR(inode->i_mode) && \
 				 (inode)->i_rdev == WHITEOUT_DEV)
 
+#define IS_FSDAX(inode) (IS_ENABLED(CONFIG_FS_DAX) && IS_DAX(inode))
+
 static inline bool HAS_UNMAPPED_ID(struct inode *inode)
 {
 	return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 4/6] dax: fix S_DAX definition
  2018-02-24  0:43 [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes Dan Williams
                   ` (2 preceding siblings ...)
  2018-02-24  0:43 ` [PATCH v3 3/6] xfs, dax: introduce IS_FSDAX() Dan Williams
@ 2018-02-24  0:43 ` Dan Williams
  2018-02-24  0:43 ` [PATCH v3 5/6] dax: short circuit vma_is_fsdax() in the CONFIG_FS_DAX=n case Dan Williams
  2018-02-24  0:43 ` [PATCH v3 6/6] vfio: disable filesystem-dax page pinning Dan Williams
  5 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2018-02-24  0:43 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jan Kara, linux-kernel, stable, linux-mm, Alexander Viro,
	linux-fsdevel, Christoph Hellwig

Make sure S_DAX is defined in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y
case. Otherwise vma_is_dax() may incorrectly return false in the
Device-DAX case.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/fs.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index a4310a95011b..7418341578a3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1859,7 +1859,7 @@ struct super_operations {
 #define S_IMA		1024	/* Inode has an associated IMA struct */
 #define S_AUTOMOUNT	2048	/* Automount/referral quasi-directory */
 #define S_NOSEC		4096	/* no suid or xattr security attributes */
-#ifdef CONFIG_FS_DAX
+#if IS_ENABLED(CONFIG_FS_DAX) || IS_ENABLED(CONFIG_DEV_DAX)
 #define S_DAX		8192	/* Direct Access, avoiding the page cache */
 #else
 #define S_DAX		0	/* Make all the DAX code disappear */

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 5/6] dax: short circuit vma_is_fsdax() in the CONFIG_FS_DAX=n case
  2018-02-24  0:43 [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes Dan Williams
                   ` (3 preceding siblings ...)
  2018-02-24  0:43 ` [PATCH v3 4/6] dax: fix S_DAX definition Dan Williams
@ 2018-02-24  0:43 ` Dan Williams
  2018-02-26 10:08   ` Jan Kara
  2018-02-24  0:43 ` [PATCH v3 6/6] vfio: disable filesystem-dax page pinning Dan Williams
  5 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2018-02-24  0:43 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Jan Kara, linux-kernel, linux-mm, Alexander Viro, linux-fsdevel,
	Christoph Hellwig

Do not bother looking up the file type in the case when Filesystem-DAX
is disabled at build time.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/fs.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7418341578a3..c97fc4dbaae1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3197,6 +3197,8 @@ static inline bool vma_is_fsdax(struct vm_area_struct *vma)
 
 	if (!vma->vm_file)
 		return false;
+	if (!IS_ENABLED(CONFIG_FS_DAX))
+		return false;
 	if (!vma_is_dax(vma))
 		return false;
 	inode = file_inode(vma->vm_file);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 6/6] vfio: disable filesystem-dax page pinning
  2018-02-24  0:43 [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes Dan Williams
                   ` (4 preceding siblings ...)
  2018-02-24  0:43 ` [PATCH v3 5/6] dax: short circuit vma_is_fsdax() in the CONFIG_FS_DAX=n case Dan Williams
@ 2018-02-24  0:43 ` Dan Williams
  5 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2018-02-24  0:43 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Haozhong Zhang, Michal Hocko, kvm, linux-kernel, stable,
	linux-mm, Alex Williamson, linux-fsdevel, Christoph Hellwig

Filesystem-DAX is incompatible with 'longterm' page pinning. Without
page cache indirection a DAX mapping maps filesystem blocks directly.
This means that the filesystem must not modify a file's block map while
any page in a mapping is pinned. In order to prevent the situation of
userspace holding of filesystem operations indefinitely, disallow
'longterm' Filesystem-DAX mappings.

RDMA has the same conflict and the plan there is to add a 'with lease'
mechanism to allow the kernel to notify userspace that the mapping is
being torn down for block-map maintenance. Perhaps something similar can
be put in place for vfio.

Note that xfs and ext4 still report:

   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk"

...at mount time, and resolving the dax-dma-vs-truncate problem is one
of the last hurdles to remove that designation.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org>
Reported-by: Haozhong Zhang <haozhong.zhang@intel.com>
Fixes: d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/vfio/vfio_iommu_type1.c |   18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e30e29ae4819..45657e2b1ff7 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -338,11 +338,12 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
 {
 	struct page *page[1];
 	struct vm_area_struct *vma;
+	struct vm_area_struct *vmas[1];
 	int ret;
 
 	if (mm == current->mm) {
-		ret = get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE),
-					  page);
+		ret = get_user_pages_longterm(vaddr, 1, !!(prot & IOMMU_WRITE),
+					      page, vmas);
 	} else {
 		unsigned int flags = 0;
 
@@ -351,7 +352,18 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
 
 		down_read(&mm->mmap_sem);
 		ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
-					    NULL, NULL);
+					    vmas, NULL);
+		/*
+		 * The lifetime of a vaddr_get_pfn() page pin is
+		 * userspace-controlled. In the fs-dax case this could
+		 * lead to indefinite stalls in filesystem operations.
+		 * Disallow attempts to pin fs-dax pages via this
+		 * interface.
+		 */
+		if (ret > 0 && vma_is_fsdax(vmas[0])) {
+			ret = -EOPNOTSUPP;
+			put_page(page[0]);
+		}
 		up_read(&mm->mmap_sem);
 	}
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/6] dax: fix vma_is_fsdax() helper
  2018-02-24  0:43 ` [PATCH v3 1/6] dax: fix vma_is_fsdax() helper Dan Williams
@ 2018-02-26  9:51   ` Jan Kara
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2018-02-26  9:51 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Jane Chu, Haozhong Zhang, linux-kernel, stable,
	linux-mm, Gerd Rausch, linux-fsdevel

On Fri 23-02-18 16:43:11, Dan Williams wrote:
> Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use
> S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on
> device-dax instances when those are meant to be explicitly allowed.
> 
> Fixes: 2bb6d2837083 ("mm: introduce get_user_pages_longterm")
> Cc: <stable@vger.kernel.org>
> Reported-by: Gerd Rausch <gerd.rausch@oracle.com>
> Acked-by: Jane Chu <jane.chu@oracle.com>
> Reported-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

I wonder how I didn't notice this when reading the original patch. Anyway
the fix looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/fs.h |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 2a815560fda0..79c413985305 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3198,7 +3198,7 @@ static inline bool vma_is_fsdax(struct vm_area_struct *vma)
>  	if (!vma_is_dax(vma))
>  		return false;
>  	inode = file_inode(vma->vm_file);
> -	if (inode->i_mode == S_IFCHR)
> +	if (S_ISCHR(inode->i_mode))
>  		return false; /* device-dax */
>  	return true;
>  }
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/6] xfs, dax: introduce IS_FSDAX()
  2018-02-24  0:43 ` [PATCH v3 3/6] xfs, dax: introduce IS_FSDAX() Dan Williams
@ 2018-02-26 10:06   ` Jan Kara
  2018-02-26 15:48     ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2018-02-26 10:06 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Darrick J. Wong, linux-kernel, stable, linux-xfs,
	linux-mm, linux-fsdevel, kbuild test robot

On Fri 23-02-18 16:43:27, Dan Williams wrote:
> Given that S_DAX is non-zero in the FS_DAX=n + DEV_DAX=y case, another
> mechanism besides the plain IS_DAX() check to compile out dead
> filesystem-dax code paths. Without IS_FSDAX() xfs will fail at link time
> with:
> 
>     ERROR: "dax_finish_sync_fault" [fs/xfs/xfs.ko] undefined!
>     ERROR: "dax_iomap_fault" [fs/xfs/xfs.ko] undefined!
>     ERROR: "dax_iomap_rw" [fs/xfs/xfs.ko] undefined!
> 
> This compile failure was previously hidden by the fact that S_DAX was
> erroneously defined to '0' in the FS_DAX=n + DEV_DAX=y case.
> 
> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
> Cc: linux-xfs@vger.kernel.org
> Cc: <stable@vger.kernel.org>
> Reported-by: kbuild test robot <fengguang.wu@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

As much as I appreciate that relying on compiler to optimize out dead
branches results in nicer looking code this is an example where it
backfires. Also having IS_DAX() and IS_FSDAX() doing almost the same, just
not exactly the same, is IMHO a recipe for confusion (e.g. a casual reader
could think why does ext4 get away with using IS_DAX while XFS has to use
IS_FSDAX?). So I'd just prefer to handle this as is usual in other kernel
areas - define empty stubs for all exported functions when CONFIG_FS_DAX is
not enabled. That way code can stay without ugly ifdefs and we don't have
to bother with IS_FSDAX vs IS_DAX distinction in filesystem code. Thoughts?

								Honza

> ---
>  fs/xfs/xfs_file.c    |   14 +++++++-------
>  fs/xfs/xfs_ioctl.c   |    4 ++--
>  fs/xfs/xfs_iomap.c   |    6 +++---
>  fs/xfs/xfs_reflink.c |    2 +-
>  include/linux/fs.h   |    2 ++
>  5 files changed, 15 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 9ea08326f876..46a098b90fd0 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -288,7 +288,7 @@ xfs_file_read_iter(
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> -	if (IS_DAX(inode))
> +	if (IS_FSDAX(inode))
>  		ret = xfs_file_dax_read(iocb, to);
>  	else if (iocb->ki_flags & IOCB_DIRECT)
>  		ret = xfs_file_dio_aio_read(iocb, to);
> @@ -726,7 +726,7 @@ xfs_file_write_iter(
>  	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
>  		return -EIO;
>  
> -	if (IS_DAX(inode))
> +	if (IS_FSDAX(inode))
>  		ret = xfs_file_dax_write(iocb, from);
>  	else if (iocb->ki_flags & IOCB_DIRECT) {
>  		/*
> @@ -1045,7 +1045,7 @@ __xfs_filemap_fault(
>  	}
>  
>  	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
> -	if (IS_DAX(inode)) {
> +	if (IS_FSDAX(inode)) {
>  		pfn_t pfn;
>  
>  		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops);
> @@ -1070,7 +1070,7 @@ xfs_filemap_fault(
>  {
>  	/* DAX can shortcut the normal fault path on write faults! */
>  	return __xfs_filemap_fault(vmf, PE_SIZE_PTE,
> -			IS_DAX(file_inode(vmf->vma->vm_file)) &&
> +			IS_FSDAX(file_inode(vmf->vma->vm_file)) &&
>  			(vmf->flags & FAULT_FLAG_WRITE));
>  }
>  
> @@ -1079,7 +1079,7 @@ xfs_filemap_huge_fault(
>  	struct vm_fault		*vmf,
>  	enum page_entry_size	pe_size)
>  {
> -	if (!IS_DAX(file_inode(vmf->vma->vm_file)))
> +	if (!IS_FSDAX(file_inode(vmf->vma->vm_file)))
>  		return VM_FAULT_FALLBACK;
>  
>  	/* DAX can shortcut the normal fault path on write faults! */
> @@ -1124,12 +1124,12 @@ xfs_file_mmap(
>  	 * We don't support synchronous mappings for non-DAX files. At least
>  	 * until someone comes with a sensible use case.
>  	 */
> -	if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> +	if (!IS_FSDAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
>  		return -EOPNOTSUPP;
>  
>  	file_accessed(filp);
>  	vma->vm_ops = &xfs_file_vm_ops;
> -	if (IS_DAX(file_inode(filp)))
> +	if (IS_FSDAX(file_inode(filp)))
>  		vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
>  	return 0;
>  }
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 89fb1eb80aae..234279ff66ce 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1108,9 +1108,9 @@ xfs_ioctl_setattr_dax_invalidate(
>  	}
>  
>  	/* If the DAX state is not changing, we have nothing to do here. */
> -	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
> +	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_FSDAX(inode))
>  		return 0;
> -	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
> +	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_FSDAX(inode))
>  		return 0;
>  
>  	/* lock, flush and invalidate mapping in preparation for flag change */
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 66e1edbfb2b2..cf794d429aec 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -241,7 +241,7 @@ xfs_iomap_write_direct(
>  	 * the reserve block pool for bmbt block allocation if there is no space
>  	 * left but we need to do unwritten extent conversion.
>  	 */
> -	if (IS_DAX(VFS_I(ip))) {
> +	if (IS_FSDAX(VFS_I(ip))) {
>  		bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO;
>  		if (imap->br_state == XFS_EXT_UNWRITTEN) {
>  			tflags |= XFS_TRANS_RESERVE;
> @@ -952,7 +952,7 @@ static inline bool imap_needs_alloc(struct inode *inode,
>  	return !nimaps ||
>  		imap->br_startblock == HOLESTARTBLOCK ||
>  		imap->br_startblock == DELAYSTARTBLOCK ||
> -		(IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
> +		(IS_FSDAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
>  }
>  
>  static inline bool need_excl_ilock(struct xfs_inode *ip, unsigned flags)
> @@ -988,7 +988,7 @@ xfs_file_iomap_begin(
>  		return -EIO;
>  
>  	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
> -			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
> +			!IS_FSDAX(inode) && !xfs_get_extsz_hint(ip)) {
>  		/* Reserve delalloc blocks for regular writeback. */
>  		return xfs_file_iomap_begin_delay(inode, offset, length, iomap);
>  	}
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 270246943a06..a126e00e05e3 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1351,7 +1351,7 @@ xfs_reflink_remap_range(
>  		goto out_unlock;
>  
>  	/* Don't share DAX file data for now. */
> -	if (IS_DAX(inode_in) || IS_DAX(inode_out))
> +	if (IS_FSDAX(inode_in) || IS_FSDAX(inode_out))
>  		goto out_unlock;
>  
>  	ret = vfs_clone_file_prep_inodes(inode_in, pos_in, inode_out, pos_out,
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 79c413985305..a4310a95011b 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1909,6 +1909,8 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags
>  #define IS_WHITEOUT(inode)	(S_ISCHR(inode->i_mode) && \
>  				 (inode)->i_rdev == WHITEOUT_DEV)
>  
> +#define IS_FSDAX(inode) (IS_ENABLED(CONFIG_FS_DAX) && IS_DAX(inode))
> +
>  static inline bool HAS_UNMAPPED_ID(struct inode *inode)
>  {
>  	return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 5/6] dax: short circuit vma_is_fsdax() in the CONFIG_FS_DAX=n case
  2018-02-24  0:43 ` [PATCH v3 5/6] dax: short circuit vma_is_fsdax() in the CONFIG_FS_DAX=n case Dan Williams
@ 2018-02-26 10:08   ` Jan Kara
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2018-02-26 10:08 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Jan Kara, linux-kernel, linux-mm, Alexander Viro,
	linux-fsdevel, Christoph Hellwig

On Fri 23-02-18 16:43:37, Dan Williams wrote:
> Do not bother looking up the file type in the case when Filesystem-DAX
> is disabled at build time.
> 
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/fs.h |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 7418341578a3..c97fc4dbaae1 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3197,6 +3197,8 @@ static inline bool vma_is_fsdax(struct vm_area_struct *vma)
>  
>  	if (!vma->vm_file)
>  		return false;
> +	if (!IS_ENABLED(CONFIG_FS_DAX))
> +		return false;
>  	if (!vma_is_dax(vma))
>  		return false;
>  	inode = file_inode(vma->vm_file);
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/6] xfs, dax: introduce IS_FSDAX()
  2018-02-26 10:06   ` Jan Kara
@ 2018-02-26 15:48     ` Dan Williams
  0 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2018-02-26 15:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-nvdimm, Darrick J. Wong, Linux Kernel Mailing List, stable,
	linux-xfs, Linux MM, linux-fsdevel, kbuild test robot

On Mon, Feb 26, 2018 at 2:06 AM, Jan Kara <jack@suse.cz> wrote:
> On Fri 23-02-18 16:43:27, Dan Williams wrote:
>> Given that S_DAX is non-zero in the FS_DAX=n + DEV_DAX=y case, another
>> mechanism besides the plain IS_DAX() check to compile out dead
>> filesystem-dax code paths. Without IS_FSDAX() xfs will fail at link time
>> with:
>>
>>     ERROR: "dax_finish_sync_fault" [fs/xfs/xfs.ko] undefined!
>>     ERROR: "dax_iomap_fault" [fs/xfs/xfs.ko] undefined!
>>     ERROR: "dax_iomap_rw" [fs/xfs/xfs.ko] undefined!
>>
>> This compile failure was previously hidden by the fact that S_DAX was
>> erroneously defined to '0' in the FS_DAX=n + DEV_DAX=y case.
>>
>> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
>> Cc: linux-xfs@vger.kernel.org
>> Cc: <stable@vger.kernel.org>
>> Reported-by: kbuild test robot <fengguang.wu@intel.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> As much as I appreciate that relying on compiler to optimize out dead
> branches results in nicer looking code this is an example where it
> backfires. Also having IS_DAX() and IS_FSDAX() doing almost the same, just
> not exactly the same, is IMHO a recipe for confusion (e.g. a casual reader
> could think why does ext4 get away with using IS_DAX while XFS has to use
> IS_FSDAX?). So I'd just prefer to handle this as is usual in other kernel
> areas - define empty stubs for all exported functions when CONFIG_FS_DAX is
> not enabled. That way code can stay without ugly ifdefs and we don't have
> to bother with IS_FSDAX vs IS_DAX distinction in filesystem code. Thoughts?
>

I think my patch is incomplete either way, because the current
IS_DAX() usages handle more than just compiling out calls to fs/dax.c
symbols. I.e. even if there were stubs for all fs/dax.c call outs call
there are still local usages of the helper. Lets kill IS_DAX() and
only have IS_FSDAX() and IS_DEVDAX() with the S_ISCHR() check. Any
issues with that?

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-02-26 15:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-24  0:43 [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes Dan Williams
2018-02-24  0:43 ` [PATCH v3 1/6] dax: fix vma_is_fsdax() helper Dan Williams
2018-02-26  9:51   ` Jan Kara
2018-02-24  0:43 ` [PATCH v3 2/6] dax: fix dax_mapping() definition in the FS_DAX=n + DEV_DAX=y case Dan Williams
2018-02-24  0:43 ` [PATCH v3 3/6] xfs, dax: introduce IS_FSDAX() Dan Williams
2018-02-26 10:06   ` Jan Kara
2018-02-26 15:48     ` Dan Williams
2018-02-24  0:43 ` [PATCH v3 4/6] dax: fix S_DAX definition Dan Williams
2018-02-24  0:43 ` [PATCH v3 5/6] dax: short circuit vma_is_fsdax() in the CONFIG_FS_DAX=n case Dan Williams
2018-02-26 10:08   ` Jan Kara
2018-02-24  0:43 ` [PATCH v3 6/6] vfio: disable filesystem-dax page pinning Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).