* [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes
@ 2018-02-24 0:43 Dan Williams
2018-02-24 0:43 ` [PATCH v3 1/6] dax: fix vma_is_fsdax() helper Dan Williams
` (5 more replies)
0 siblings, 6 replies; 11+ messages in thread
From: Dan Williams @ 2018-02-24 0:43 UTC (permalink / raw)
To: linux-nvdimm
Cc: Jane Chu, Michal Hocko, Jan Kara, kvm, Darrick J. Wong,
linux-kernel, stable, linux-xfs, linux-mm, Alex Williamson,
Gerd Rausch, Alexander Viro, linux-fsdevel, kbuild test robot,
Christoph Hellwig
Changes since v2 [1]:
* Fix yet more compile breakage in the FS_DAX=n and DEV_DAX=y case.
(0day robot)
[1]: https://lists.01.org/pipermail/linux-nvdimm/2018-February/014046.html
---
The vfio interface, like RDMA, wants to setup long term (indefinite)
pins of the pages backing an address range so that a guest or userspace
driver can perform DMA to the with physical address. Given that this
pinning may lead to filesystem operations deadlocking in the
filesystem-dax case, the pinning request needs to be rejected.
The longer term fix for vfio, RDMA, and any other long term pin user, is
to provide a 'pin with lease' mechanism. Similar to the leases that are
hold for pNFS RDMA layouts, this userspace lease gives the kernel a way
to notify userspace that the block layout of the file is changing and
the kernel is revoking access to pinned pages.
---
Dan Williams (6):
dax: fix vma_is_fsdax() helper
dax: fix dax_mapping() definition in the FS_DAX=n + DEV_DAX=y case
xfs, dax: introduce IS_FSDAX()
dax: fix S_DAX definition
dax: short circuit vma_is_fsdax() in the CONFIG_FS_DAX=n case
vfio: disable filesystem-dax page pinning
drivers/vfio/vfio_iommu_type1.c | 18 +++++++++++++++---
fs/xfs/xfs_file.c | 14 +++++++-------
fs/xfs/xfs_ioctl.c | 4 ++--
fs/xfs/xfs_iomap.c | 6 +++---
fs/xfs/xfs_reflink.c | 2 +-
include/linux/dax.h | 9 ++++++---
include/linux/fs.h | 8 ++++++--
7 files changed, 40 insertions(+), 21 deletions(-)
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 1/6] dax: fix vma_is_fsdax() helper
2018-02-24 0:43 [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes Dan Williams
@ 2018-02-24 0:43 ` Dan Williams
2018-02-26 9:51 ` Jan Kara
2018-02-24 0:43 ` [PATCH v3 2/6] dax: fix dax_mapping() definition in the FS_DAX=n + DEV_DAX=y case Dan Williams
` (4 subsequent siblings)
5 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2018-02-24 0:43 UTC (permalink / raw)
To: linux-nvdimm
Cc: Jane Chu, linux-kernel, stable, linux-mm, Gerd Rausch, linux-fsdevel
Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use
S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on
device-dax instances when those are meant to be explicitly allowed.
Fixes: 2bb6d2837083 ("mm: introduce get_user_pages_longterm")
Cc: <stable@vger.kernel.org>
Reported-by: Gerd Rausch <gerd.rausch@oracle.com>
Acked-by: Jane Chu <jane.chu@oracle.com>
Reported-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
include/linux/fs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2a815560fda0..79c413985305 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3198,7 +3198,7 @@ static inline bool vma_is_fsdax(struct vm_area_struct *vma)
if (!vma_is_dax(vma))
return false;
inode = file_inode(vma->vm_file);
- if (inode->i_mode == S_IFCHR)
+ if (S_ISCHR(inode->i_mode))
return false; /* device-dax */
return true;
}
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 2/6] dax: fix dax_mapping() definition in the FS_DAX=n + DEV_DAX=y case
2018-02-24 0:43 [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes Dan Williams
2018-02-24 0:43 ` [PATCH v3 1/6] dax: fix vma_is_fsdax() helper Dan Williams
@ 2018-02-24 0:43 ` Dan Williams
2018-02-24 0:43 ` [PATCH v3 3/6] xfs, dax: introduce IS_FSDAX() Dan Williams
` (3 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2018-02-24 0:43 UTC (permalink / raw)
To: linux-nvdimm
Cc: Jan Kara, linux-kernel, stable, linux-mm, Alexander Viro,
linux-fsdevel, kbuild test robot, Christoph Hellwig
An address_space will only have dax exceptional entries when FS_DAX is
enabled. The current reliance on S_DAX causes compile failures when
S_DAX is defined for DEV_DAX, but FS_DAX is disabled. Make dax_mapping()
always return false so that mm/truncate.c drops its link time
dependencies on fs/dax.c.
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
include/linux/dax.h | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0185ecdae135..62e8cf7eb566 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -107,6 +107,10 @@ int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
int __dax_zero_page_range(struct block_device *bdev,
struct dax_device *dax_dev, sector_t sector,
unsigned int offset, unsigned int length);
+static inline bool dax_mapping(struct address_space *mapping)
+{
+ return mapping->host && IS_DAX(mapping->host);
+}
#else
static inline int __dax_zero_page_range(struct block_device *bdev,
struct dax_device *dax_dev, sector_t sector,
@@ -114,12 +118,11 @@ static inline int __dax_zero_page_range(struct block_device *bdev,
{
return -ENXIO;
}
-#endif
-
static inline bool dax_mapping(struct address_space *mapping)
{
- return mapping->host && IS_DAX(mapping->host);
+ return false;
}
+#endif
struct writeback_control;
int dax_writeback_mapping_range(struct address_space *mapping,
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 3/6] xfs, dax: introduce IS_FSDAX()
2018-02-24 0:43 [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes Dan Williams
2018-02-24 0:43 ` [PATCH v3 1/6] dax: fix vma_is_fsdax() helper Dan Williams
2018-02-24 0:43 ` [PATCH v3 2/6] dax: fix dax_mapping() definition in the FS_DAX=n + DEV_DAX=y case Dan Williams
@ 2018-02-24 0:43 ` Dan Williams
2018-02-26 10:06 ` Jan Kara
2018-02-24 0:43 ` [PATCH v3 4/6] dax: fix S_DAX definition Dan Williams
` (2 subsequent siblings)
5 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2018-02-24 0:43 UTC (permalink / raw)
To: linux-nvdimm
Cc: Darrick J. Wong, linux-kernel, stable, linux-xfs, linux-mm,
linux-fsdevel, kbuild test robot
Given that S_DAX is non-zero in the FS_DAX=n + DEV_DAX=y case, another
mechanism besides the plain IS_DAX() check to compile out dead
filesystem-dax code paths. Without IS_FSDAX() xfs will fail at link time
with:
ERROR: "dax_finish_sync_fault" [fs/xfs/xfs.ko] undefined!
ERROR: "dax_iomap_fault" [fs/xfs/xfs.ko] undefined!
ERROR: "dax_iomap_rw" [fs/xfs/xfs.ko] undefined!
This compile failure was previously hidden by the fact that S_DAX was
erroneously defined to '0' in the FS_DAX=n + DEV_DAX=y case.
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Cc: <stable@vger.kernel.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
fs/xfs/xfs_file.c | 14 +++++++-------
fs/xfs/xfs_ioctl.c | 4 ++--
fs/xfs/xfs_iomap.c | 6 +++---
fs/xfs/xfs_reflink.c | 2 +-
include/linux/fs.h | 2 ++
5 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 9ea08326f876..46a098b90fd0 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -288,7 +288,7 @@ xfs_file_read_iter(
if (XFS_FORCED_SHUTDOWN(mp))
return -EIO;
- if (IS_DAX(inode))
+ if (IS_FSDAX(inode))
ret = xfs_file_dax_read(iocb, to);
else if (iocb->ki_flags & IOCB_DIRECT)
ret = xfs_file_dio_aio_read(iocb, to);
@@ -726,7 +726,7 @@ xfs_file_write_iter(
if (XFS_FORCED_SHUTDOWN(ip->i_mount))
return -EIO;
- if (IS_DAX(inode))
+ if (IS_FSDAX(inode))
ret = xfs_file_dax_write(iocb, from);
else if (iocb->ki_flags & IOCB_DIRECT) {
/*
@@ -1045,7 +1045,7 @@ __xfs_filemap_fault(
}
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
- if (IS_DAX(inode)) {
+ if (IS_FSDAX(inode)) {
pfn_t pfn;
ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops);
@@ -1070,7 +1070,7 @@ xfs_filemap_fault(
{
/* DAX can shortcut the normal fault path on write faults! */
return __xfs_filemap_fault(vmf, PE_SIZE_PTE,
- IS_DAX(file_inode(vmf->vma->vm_file)) &&
+ IS_FSDAX(file_inode(vmf->vma->vm_file)) &&
(vmf->flags & FAULT_FLAG_WRITE));
}
@@ -1079,7 +1079,7 @@ xfs_filemap_huge_fault(
struct vm_fault *vmf,
enum page_entry_size pe_size)
{
- if (!IS_DAX(file_inode(vmf->vma->vm_file)))
+ if (!IS_FSDAX(file_inode(vmf->vma->vm_file)))
return VM_FAULT_FALLBACK;
/* DAX can shortcut the normal fault path on write faults! */
@@ -1124,12 +1124,12 @@ xfs_file_mmap(
* We don't support synchronous mappings for non-DAX files. At least
* until someone comes with a sensible use case.
*/
- if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
+ if (!IS_FSDAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
return -EOPNOTSUPP;
file_accessed(filp);
vma->vm_ops = &xfs_file_vm_ops;
- if (IS_DAX(file_inode(filp)))
+ if (IS_FSDAX(file_inode(filp)))
vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
return 0;
}
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 89fb1eb80aae..234279ff66ce 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1108,9 +1108,9 @@ xfs_ioctl_setattr_dax_invalidate(
}
/* If the DAX state is not changing, we have nothing to do here. */
- if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
+ if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_FSDAX(inode))
return 0;
- if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
+ if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_FSDAX(inode))
return 0;
/* lock, flush and invalidate mapping in preparation for flag change */
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 66e1edbfb2b2..cf794d429aec 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -241,7 +241,7 @@ xfs_iomap_write_direct(
* the reserve block pool for bmbt block allocation if there is no space
* left but we need to do unwritten extent conversion.
*/
- if (IS_DAX(VFS_I(ip))) {
+ if (IS_FSDAX(VFS_I(ip))) {
bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO;
if (imap->br_state == XFS_EXT_UNWRITTEN) {
tflags |= XFS_TRANS_RESERVE;
@@ -952,7 +952,7 @@ static inline bool imap_needs_alloc(struct inode *inode,
return !nimaps ||
imap->br_startblock == HOLESTARTBLOCK ||
imap->br_startblock == DELAYSTARTBLOCK ||
- (IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
+ (IS_FSDAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
}
static inline bool need_excl_ilock(struct xfs_inode *ip, unsigned flags)
@@ -988,7 +988,7 @@ xfs_file_iomap_begin(
return -EIO;
if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
- !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
+ !IS_FSDAX(inode) && !xfs_get_extsz_hint(ip)) {
/* Reserve delalloc blocks for regular writeback. */
return xfs_file_iomap_begin_delay(inode, offset, length, iomap);
}
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 270246943a06..a126e00e05e3 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1351,7 +1351,7 @@ xfs_reflink_remap_range(
goto out_unlock;
/* Don't share DAX file data for now. */
- if (IS_DAX(inode_in) || IS_DAX(inode_out))
+ if (IS_FSDAX(inode_in) || IS_FSDAX(inode_out))
goto out_unlock;
ret = vfs_clone_file_prep_inodes(inode_in, pos_in, inode_out, pos_out,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 79c413985305..a4310a95011b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1909,6 +1909,8 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags
#define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \
(inode)->i_rdev == WHITEOUT_DEV)
+#define IS_FSDAX(inode) (IS_ENABLED(CONFIG_FS_DAX) && IS_DAX(inode))
+
static inline bool HAS_UNMAPPED_ID(struct inode *inode)
{
return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 4/6] dax: fix S_DAX definition
2018-02-24 0:43 [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes Dan Williams
` (2 preceding siblings ...)
2018-02-24 0:43 ` [PATCH v3 3/6] xfs, dax: introduce IS_FSDAX() Dan Williams
@ 2018-02-24 0:43 ` Dan Williams
2018-02-24 0:43 ` [PATCH v3 5/6] dax: short circuit vma_is_fsdax() in the CONFIG_FS_DAX=n case Dan Williams
2018-02-24 0:43 ` [PATCH v3 6/6] vfio: disable filesystem-dax page pinning Dan Williams
5 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2018-02-24 0:43 UTC (permalink / raw)
To: linux-nvdimm
Cc: Jan Kara, linux-kernel, stable, linux-mm, Alexander Viro,
linux-fsdevel, Christoph Hellwig
Make sure S_DAX is defined in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y
case. Otherwise vma_is_dax() may incorrectly return false in the
Device-DAX case.
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
include/linux/fs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a4310a95011b..7418341578a3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1859,7 +1859,7 @@ struct super_operations {
#define S_IMA 1024 /* Inode has an associated IMA struct */
#define S_AUTOMOUNT 2048 /* Automount/referral quasi-directory */
#define S_NOSEC 4096 /* no suid or xattr security attributes */
-#ifdef CONFIG_FS_DAX
+#if IS_ENABLED(CONFIG_FS_DAX) || IS_ENABLED(CONFIG_DEV_DAX)
#define S_DAX 8192 /* Direct Access, avoiding the page cache */
#else
#define S_DAX 0 /* Make all the DAX code disappear */
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 5/6] dax: short circuit vma_is_fsdax() in the CONFIG_FS_DAX=n case
2018-02-24 0:43 [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes Dan Williams
` (3 preceding siblings ...)
2018-02-24 0:43 ` [PATCH v3 4/6] dax: fix S_DAX definition Dan Williams
@ 2018-02-24 0:43 ` Dan Williams
2018-02-26 10:08 ` Jan Kara
2018-02-24 0:43 ` [PATCH v3 6/6] vfio: disable filesystem-dax page pinning Dan Williams
5 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2018-02-24 0:43 UTC (permalink / raw)
To: linux-nvdimm
Cc: Jan Kara, linux-kernel, linux-mm, Alexander Viro, linux-fsdevel,
Christoph Hellwig
Do not bother looking up the file type in the case when Filesystem-DAX
is disabled at build time.
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
include/linux/fs.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7418341578a3..c97fc4dbaae1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3197,6 +3197,8 @@ static inline bool vma_is_fsdax(struct vm_area_struct *vma)
if (!vma->vm_file)
return false;
+ if (!IS_ENABLED(CONFIG_FS_DAX))
+ return false;
if (!vma_is_dax(vma))
return false;
inode = file_inode(vma->vm_file);
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 6/6] vfio: disable filesystem-dax page pinning
2018-02-24 0:43 [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes Dan Williams
` (4 preceding siblings ...)
2018-02-24 0:43 ` [PATCH v3 5/6] dax: short circuit vma_is_fsdax() in the CONFIG_FS_DAX=n case Dan Williams
@ 2018-02-24 0:43 ` Dan Williams
5 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2018-02-24 0:43 UTC (permalink / raw)
To: linux-nvdimm
Cc: Michal Hocko, kvm, linux-kernel, stable, linux-mm,
Alex Williamson, linux-fsdevel, Christoph Hellwig
Filesystem-DAX is incompatible with 'longterm' page pinning. Without
page cache indirection a DAX mapping maps filesystem blocks directly.
This means that the filesystem must not modify a file's block map while
any page in a mapping is pinned. In order to prevent the situation of
userspace holding of filesystem operations indefinitely, disallow
'longterm' Filesystem-DAX mappings.
RDMA has the same conflict and the plan there is to add a 'with lease'
mechanism to allow the kernel to notify userspace that the mapping is
being torn down for block-map maintenance. Perhaps something similar can
be put in place for vfio.
Note that xfs and ext4 still report:
"DAX enabled. Warning: EXPERIMENTAL, use at your own risk"
...at mount time, and resolving the dax-dma-vs-truncate problem is one
of the last hurdles to remove that designation.
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org>
Reported-by: Haozhong Zhang <haozhong.zhang@intel.com>
Fixes: d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/vfio/vfio_iommu_type1.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e30e29ae4819..45657e2b1ff7 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -338,11 +338,12 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
{
struct page *page[1];
struct vm_area_struct *vma;
+ struct vm_area_struct *vmas[1];
int ret;
if (mm == current->mm) {
- ret = get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE),
- page);
+ ret = get_user_pages_longterm(vaddr, 1, !!(prot & IOMMU_WRITE),
+ page, vmas);
} else {
unsigned int flags = 0;
@@ -351,7 +352,18 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
down_read(&mm->mmap_sem);
ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
- NULL, NULL);
+ vmas, NULL);
+ /*
+ * The lifetime of a vaddr_get_pfn() page pin is
+ * userspace-controlled. In the fs-dax case this could
+ * lead to indefinite stalls in filesystem operations.
+ * Disallow attempts to pin fs-dax pages via this
+ * interface.
+ */
+ if (ret > 0 && vma_is_fsdax(vmas[0])) {
+ ret = -EOPNOTSUPP;
+ put_page(page[0]);
+ }
up_read(&mm->mmap_sem);
}
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3 1/6] dax: fix vma_is_fsdax() helper
2018-02-24 0:43 ` [PATCH v3 1/6] dax: fix vma_is_fsdax() helper Dan Williams
@ 2018-02-26 9:51 ` Jan Kara
0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2018-02-26 9:51 UTC (permalink / raw)
To: Dan Williams
Cc: Jane Chu, linux-nvdimm, linux-kernel, stable, linux-mm,
Gerd Rausch, linux-fsdevel
On Fri 23-02-18 16:43:11, Dan Williams wrote:
> Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use
> S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on
> device-dax instances when those are meant to be explicitly allowed.
>
> Fixes: 2bb6d2837083 ("mm: introduce get_user_pages_longterm")
> Cc: <stable@vger.kernel.org>
> Reported-by: Gerd Rausch <gerd.rausch@oracle.com>
> Acked-by: Jane Chu <jane.chu@oracle.com>
> Reported-by: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
I wonder how I didn't notice this when reading the original patch. Anyway
the fix looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> include/linux/fs.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 2a815560fda0..79c413985305 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3198,7 +3198,7 @@ static inline bool vma_is_fsdax(struct vm_area_struct *vma)
> if (!vma_is_dax(vma))
> return false;
> inode = file_inode(vma->vm_file);
> - if (inode->i_mode == S_IFCHR)
> + if (S_ISCHR(inode->i_mode))
> return false; /* device-dax */
> return true;
> }
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 3/6] xfs, dax: introduce IS_FSDAX()
2018-02-24 0:43 ` [PATCH v3 3/6] xfs, dax: introduce IS_FSDAX() Dan Williams
@ 2018-02-26 10:06 ` Jan Kara
2018-02-26 15:48 ` Dan Williams
0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2018-02-26 10:06 UTC (permalink / raw)
To: Dan Williams
Cc: Darrick J. Wong, linux-nvdimm, linux-kernel, stable, linux-xfs,
linux-mm, linux-fsdevel, kbuild test robot
On Fri 23-02-18 16:43:27, Dan Williams wrote:
> Given that S_DAX is non-zero in the FS_DAX=n + DEV_DAX=y case, another
> mechanism besides the plain IS_DAX() check to compile out dead
> filesystem-dax code paths. Without IS_FSDAX() xfs will fail at link time
> with:
>
> ERROR: "dax_finish_sync_fault" [fs/xfs/xfs.ko] undefined!
> ERROR: "dax_iomap_fault" [fs/xfs/xfs.ko] undefined!
> ERROR: "dax_iomap_rw" [fs/xfs/xfs.ko] undefined!
>
> This compile failure was previously hidden by the fact that S_DAX was
> erroneously defined to '0' in the FS_DAX=n + DEV_DAX=y case.
>
> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
> Cc: linux-xfs@vger.kernel.org
> Cc: <stable@vger.kernel.org>
> Reported-by: kbuild test robot <fengguang.wu@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
As much as I appreciate that relying on compiler to optimize out dead
branches results in nicer looking code this is an example where it
backfires. Also having IS_DAX() and IS_FSDAX() doing almost the same, just
not exactly the same, is IMHO a recipe for confusion (e.g. a casual reader
could think why does ext4 get away with using IS_DAX while XFS has to use
IS_FSDAX?). So I'd just prefer to handle this as is usual in other kernel
areas - define empty stubs for all exported functions when CONFIG_FS_DAX is
not enabled. That way code can stay without ugly ifdefs and we don't have
to bother with IS_FSDAX vs IS_DAX distinction in filesystem code. Thoughts?
Honza
> ---
> fs/xfs/xfs_file.c | 14 +++++++-------
> fs/xfs/xfs_ioctl.c | 4 ++--
> fs/xfs/xfs_iomap.c | 6 +++---
> fs/xfs/xfs_reflink.c | 2 +-
> include/linux/fs.h | 2 ++
> 5 files changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 9ea08326f876..46a098b90fd0 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -288,7 +288,7 @@ xfs_file_read_iter(
> if (XFS_FORCED_SHUTDOWN(mp))
> return -EIO;
>
> - if (IS_DAX(inode))
> + if (IS_FSDAX(inode))
> ret = xfs_file_dax_read(iocb, to);
> else if (iocb->ki_flags & IOCB_DIRECT)
> ret = xfs_file_dio_aio_read(iocb, to);
> @@ -726,7 +726,7 @@ xfs_file_write_iter(
> if (XFS_FORCED_SHUTDOWN(ip->i_mount))
> return -EIO;
>
> - if (IS_DAX(inode))
> + if (IS_FSDAX(inode))
> ret = xfs_file_dax_write(iocb, from);
> else if (iocb->ki_flags & IOCB_DIRECT) {
> /*
> @@ -1045,7 +1045,7 @@ __xfs_filemap_fault(
> }
>
> xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
> - if (IS_DAX(inode)) {
> + if (IS_FSDAX(inode)) {
> pfn_t pfn;
>
> ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops);
> @@ -1070,7 +1070,7 @@ xfs_filemap_fault(
> {
> /* DAX can shortcut the normal fault path on write faults! */
> return __xfs_filemap_fault(vmf, PE_SIZE_PTE,
> - IS_DAX(file_inode(vmf->vma->vm_file)) &&
> + IS_FSDAX(file_inode(vmf->vma->vm_file)) &&
> (vmf->flags & FAULT_FLAG_WRITE));
> }
>
> @@ -1079,7 +1079,7 @@ xfs_filemap_huge_fault(
> struct vm_fault *vmf,
> enum page_entry_size pe_size)
> {
> - if (!IS_DAX(file_inode(vmf->vma->vm_file)))
> + if (!IS_FSDAX(file_inode(vmf->vma->vm_file)))
> return VM_FAULT_FALLBACK;
>
> /* DAX can shortcut the normal fault path on write faults! */
> @@ -1124,12 +1124,12 @@ xfs_file_mmap(
> * We don't support synchronous mappings for non-DAX files. At least
> * until someone comes with a sensible use case.
> */
> - if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> + if (!IS_FSDAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> return -EOPNOTSUPP;
>
> file_accessed(filp);
> vma->vm_ops = &xfs_file_vm_ops;
> - if (IS_DAX(file_inode(filp)))
> + if (IS_FSDAX(file_inode(filp)))
> vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
> return 0;
> }
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 89fb1eb80aae..234279ff66ce 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1108,9 +1108,9 @@ xfs_ioctl_setattr_dax_invalidate(
> }
>
> /* If the DAX state is not changing, we have nothing to do here. */
> - if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
> + if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_FSDAX(inode))
> return 0;
> - if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
> + if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_FSDAX(inode))
> return 0;
>
> /* lock, flush and invalidate mapping in preparation for flag change */
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 66e1edbfb2b2..cf794d429aec 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -241,7 +241,7 @@ xfs_iomap_write_direct(
> * the reserve block pool for bmbt block allocation if there is no space
> * left but we need to do unwritten extent conversion.
> */
> - if (IS_DAX(VFS_I(ip))) {
> + if (IS_FSDAX(VFS_I(ip))) {
> bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO;
> if (imap->br_state == XFS_EXT_UNWRITTEN) {
> tflags |= XFS_TRANS_RESERVE;
> @@ -952,7 +952,7 @@ static inline bool imap_needs_alloc(struct inode *inode,
> return !nimaps ||
> imap->br_startblock == HOLESTARTBLOCK ||
> imap->br_startblock == DELAYSTARTBLOCK ||
> - (IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
> + (IS_FSDAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
> }
>
> static inline bool need_excl_ilock(struct xfs_inode *ip, unsigned flags)
> @@ -988,7 +988,7 @@ xfs_file_iomap_begin(
> return -EIO;
>
> if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
> - !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
> + !IS_FSDAX(inode) && !xfs_get_extsz_hint(ip)) {
> /* Reserve delalloc blocks for regular writeback. */
> return xfs_file_iomap_begin_delay(inode, offset, length, iomap);
> }
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 270246943a06..a126e00e05e3 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1351,7 +1351,7 @@ xfs_reflink_remap_range(
> goto out_unlock;
>
> /* Don't share DAX file data for now. */
> - if (IS_DAX(inode_in) || IS_DAX(inode_out))
> + if (IS_FSDAX(inode_in) || IS_FSDAX(inode_out))
> goto out_unlock;
>
> ret = vfs_clone_file_prep_inodes(inode_in, pos_in, inode_out, pos_out,
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 79c413985305..a4310a95011b 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1909,6 +1909,8 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags
> #define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \
> (inode)->i_rdev == WHITEOUT_DEV)
>
> +#define IS_FSDAX(inode) (IS_ENABLED(CONFIG_FS_DAX) && IS_DAX(inode))
> +
> static inline bool HAS_UNMAPPED_ID(struct inode *inode)
> {
> return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 5/6] dax: short circuit vma_is_fsdax() in the CONFIG_FS_DAX=n case
2018-02-24 0:43 ` [PATCH v3 5/6] dax: short circuit vma_is_fsdax() in the CONFIG_FS_DAX=n case Dan Williams
@ 2018-02-26 10:08 ` Jan Kara
0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2018-02-26 10:08 UTC (permalink / raw)
To: Dan Williams
Cc: Jan Kara, linux-nvdimm, linux-kernel, linux-mm, Alexander Viro,
linux-fsdevel, Christoph Hellwig
On Fri 23-02-18 16:43:37, Dan Williams wrote:
> Do not bother looking up the file type in the case when Filesystem-DAX
> is disabled at build time.
>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> include/linux/fs.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 7418341578a3..c97fc4dbaae1 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3197,6 +3197,8 @@ static inline bool vma_is_fsdax(struct vm_area_struct *vma)
>
> if (!vma->vm_file)
> return false;
> + if (!IS_ENABLED(CONFIG_FS_DAX))
> + return false;
> if (!vma_is_dax(vma))
> return false;
> inode = file_inode(vma->vm_file);
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 3/6] xfs, dax: introduce IS_FSDAX()
2018-02-26 10:06 ` Jan Kara
@ 2018-02-26 15:48 ` Dan Williams
0 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2018-02-26 15:48 UTC (permalink / raw)
To: Jan Kara
Cc: linux-nvdimm, Darrick J. Wong, Linux Kernel Mailing List, stable,
linux-xfs, Linux MM, linux-fsdevel, kbuild test robot
On Mon, Feb 26, 2018 at 2:06 AM, Jan Kara <jack@suse.cz> wrote:
> On Fri 23-02-18 16:43:27, Dan Williams wrote:
>> Given that S_DAX is non-zero in the FS_DAX=n + DEV_DAX=y case, another
>> mechanism besides the plain IS_DAX() check to compile out dead
>> filesystem-dax code paths. Without IS_FSDAX() xfs will fail at link time
>> with:
>>
>> ERROR: "dax_finish_sync_fault" [fs/xfs/xfs.ko] undefined!
>> ERROR: "dax_iomap_fault" [fs/xfs/xfs.ko] undefined!
>> ERROR: "dax_iomap_rw" [fs/xfs/xfs.ko] undefined!
>>
>> This compile failure was previously hidden by the fact that S_DAX was
>> erroneously defined to '0' in the FS_DAX=n + DEV_DAX=y case.
>>
>> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
>> Cc: linux-xfs@vger.kernel.org
>> Cc: <stable@vger.kernel.org>
>> Reported-by: kbuild test robot <fengguang.wu@intel.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> As much as I appreciate that relying on compiler to optimize out dead
> branches results in nicer looking code this is an example where it
> backfires. Also having IS_DAX() and IS_FSDAX() doing almost the same, just
> not exactly the same, is IMHO a recipe for confusion (e.g. a casual reader
> could think why does ext4 get away with using IS_DAX while XFS has to use
> IS_FSDAX?). So I'd just prefer to handle this as is usual in other kernel
> areas - define empty stubs for all exported functions when CONFIG_FS_DAX is
> not enabled. That way code can stay without ugly ifdefs and we don't have
> to bother with IS_FSDAX vs IS_DAX distinction in filesystem code. Thoughts?
>
I think my patch is incomplete either way, because the current
IS_DAX() usages handle more than just compiling out calls to fs/dax.c
symbols. I.e. even if there were stubs for all fs/dax.c call outs call
there are still local usages of the helper. Lets kill IS_DAX() and
only have IS_FSDAX() and IS_DEVDAX() with the S_ISCHR() check. Any
issues with that?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-02-26 15:48 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-24 0:43 [PATCH v3 0/6] vfio, dax: prevent long term filesystem-dax pins and other fixes Dan Williams
2018-02-24 0:43 ` [PATCH v3 1/6] dax: fix vma_is_fsdax() helper Dan Williams
2018-02-26 9:51 ` Jan Kara
2018-02-24 0:43 ` [PATCH v3 2/6] dax: fix dax_mapping() definition in the FS_DAX=n + DEV_DAX=y case Dan Williams
2018-02-24 0:43 ` [PATCH v3 3/6] xfs, dax: introduce IS_FSDAX() Dan Williams
2018-02-26 10:06 ` Jan Kara
2018-02-26 15:48 ` Dan Williams
2018-02-24 0:43 ` [PATCH v3 4/6] dax: fix S_DAX definition Dan Williams
2018-02-24 0:43 ` [PATCH v3 5/6] dax: short circuit vma_is_fsdax() in the CONFIG_FS_DAX=n case Dan Williams
2018-02-26 10:08 ` Jan Kara
2018-02-24 0:43 ` [PATCH v3 6/6] vfio: disable filesystem-dax page pinning Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).