nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] fs, xfs: block map immutable files
@ 2017-08-11  6:39 Dan Williams
  2017-08-11  6:39 ` [PATCH v3 1/6] fs, xfs: introduce S_IOMAP_IMMUTABLE Dan Williams
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Dan Williams @ 2017-08-11  6:39 UTC (permalink / raw)
  To: darrick.wong
  Cc: Jan Kara, Andrew Morton, linux-nvdimm, linux-api,
	Trond Myklebust, Dave Chinner, linux-kernel, Anna Schumaker,
	linux-xfs, linux-mm, Alexander Viro, luto, linux-fsdevel,
	Christoph Hellwig

Changes since v2 [1]:
* Rather than have an IS_IOMAP_IMMUTABLE() check in
  xfs_alloc_file_space(), place one centrally in xfs_bmapi_write() to
  catch all attempts to write the block allocation map. (Dave)

* Make sealing an already sealed file, or unsealing an already unsealed
  file return success (Darrick)

* Set S_IOMAP_IMMUTABLE along with the transaction that sets
  XFS_DIFLAG2_IOMAP_IMMUTABLE (Darrick)

* Round the range of the allocation and extent conversion performed by
  FALLOC_FL_SEAL_BLOCK_MAP up to the filesystem block size.

* Add a proof-of-concept patch for the use of immutable files with swap.

[1]: https://lkml.org/lkml/2017/8/3/996

---

The ability to make the physical block-allocation map of a file
immutable is a powerful mechanism that allows userspace to have
predictable dax-fault latencies, flush dax mappings to persistent memory
without a syscall, and otherwise enable access to storage directly
without ongoing mediation from the filesystem.

This last aspect of direct storage addressability has been called a
"horrible abuse" [2], but the reality is quite the reverse. Enabling
files to be block-map immutable allows applications that would otherwise
need to rely on dangerous raw device access to instead use a filesystem.
Security, naming, re-provisioning capacity between usages are all better
supported with safe semantics in a filesystem compared to a device file.

It is time to "give up the idea that only the filesystem can access the
storage underlying the filesystem" [3] to enable a better / safer
alternative to using a raw device for userpace block servers, dax
hypervisors, and peer-to-peer transfers to name a few use cases.

[2]: https://lkml.org/lkml/2017/8/5/56
[3]: https://lkml.org/lkml/2017/8/6/299

---

Dan Williams (6):
      fs, xfs: introduce S_IOMAP_IMMUTABLE
      fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP
      fs, xfs: introduce FALLOC_FL_UNSEAL_BLOCK_MAP
      xfs: introduce XFS_DIFLAG2_IOMAP_IMMUTABLE
      xfs: toggle XFS_DIFLAG2_IOMAP_IMMUTABLE in response to fallocate
      mm, xfs: protect swapfile contents with immutable + unwritten extents


 fs/attr.c                   |   10 +++
 fs/nfs/file.c               |    7 ++
 fs/open.c                   |   24 +++++++
 fs/read_write.c             |    3 +
 fs/xfs/libxfs/xfs_bmap.c    |    6 ++
 fs/xfs/libxfs/xfs_bmap.h    |   12 +++-
 fs/xfs/libxfs/xfs_format.h  |    5 +-
 fs/xfs/xfs_aops.c           |   54 ++++++++++++++++
 fs/xfs/xfs_bmap_util.c      |  142 +++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_bmap_util.h      |    5 ++
 fs/xfs/xfs_file.c           |   16 ++++-
 fs/xfs/xfs_inode.c          |    2 +
 fs/xfs/xfs_ioctl.c          |    7 ++
 fs/xfs/xfs_iops.c           |    8 ++
 include/linux/falloc.h      |    4 +
 include/linux/fs.h          |    2 +
 include/uapi/linux/falloc.h |   18 +++++
 include/uapi/linux/fs.h     |    1 
 mm/filemap.c                |    5 ++
 mm/page_io.c                |    1 
 mm/swapfile.c               |   20 ++----
 21 files changed, 328 insertions(+), 24 deletions(-)
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/6] fs, xfs: introduce S_IOMAP_IMMUTABLE
  2017-08-11  6:39 [PATCH v3 0/6] fs, xfs: block map immutable files Dan Williams
@ 2017-08-11  6:39 ` Dan Williams
  2017-08-11  6:39 ` [PATCH v3 2/6] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP Dan Williams
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2017-08-11  6:39 UTC (permalink / raw)
  To: darrick.wong
  Cc: Jan Kara, linux-nvdimm, linux-api, Dave Chinner, linux-kernel,
	linux-xfs, Alexander Viro, luto, linux-fsdevel,
	Christoph Hellwig

An inode with this flag set indicates that the file's block map cannot
be changed from the currently allocated set.

The implementation of toggling the flag and sealing the state of the
extent map is saved for a later patch. The functionality provided by
S_IOMAP_IMMUTABLE, once toggle support is added, will be a superset of
that provided by S_SWAPFILE, and it is targeted to replace it.

For now, only xfs and the core vfs are updated to consider the new flag.

The additional checks that are added for this flag, beyond what we are
already doing for swapfiles, are:
* fail writes that try to extend the file size
* fail attempts to directly change the allocation map via fallocate or
  xfs ioctls. This can be done centrally by blocking
  xfs_alloc_file_space and xfs_free_file_space when the flag is set.

Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Suggested-by: Dave Chinner <david@fromorbit.com>
Suggested-by: "Darrick J. Wong" <darrick.wong@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/attr.c                |   10 ++++++++++
 fs/open.c                |    6 ++++++
 fs/read_write.c          |    3 +++
 fs/xfs/libxfs/xfs_bmap.c |    5 +++++
 fs/xfs/xfs_bmap_util.c   |    3 +++
 fs/xfs/xfs_ioctl.c       |    6 ++++++
 include/linux/fs.h       |    2 ++
 mm/filemap.c             |    5 +++++
 8 files changed, 40 insertions(+)

diff --git a/fs/attr.c b/fs/attr.c
index 135304146120..8573e364bd06 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -112,6 +112,16 @@ EXPORT_SYMBOL(setattr_prepare);
  */
 int inode_newsize_ok(const struct inode *inode, loff_t offset)
 {
+	if (IS_IOMAP_IMMUTABLE(inode)) {
+		/*
+		 * Any size change is disallowed. Size increases may
+		 * dirty metadata that an application is not prepared to
+		 * sync, and a size decrease may expose free blocks to
+		 * in-flight DMA.
+		 */
+		return -ETXTBSY;
+	}
+
 	if (inode->i_size < offset) {
 		unsigned long limit;
 
diff --git a/fs/open.c b/fs/open.c
index 35bb784763a4..7395860d7164 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -292,6 +292,12 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 		return -ETXTBSY;
 
 	/*
+	 * We cannot allow any allocation changes on an iomap immutable file
+	 */
+	if (IS_IOMAP_IMMUTABLE(inode))
+		return -ETXTBSY;
+
+	/*
 	 * Revalidate the write permissions, in case security policy has
 	 * changed since the files were opened.
 	 */
diff --git a/fs/read_write.c b/fs/read_write.c
index 0cc7033aa413..dc673be7c7cb 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1706,6 +1706,9 @@ int vfs_clone_file_prep_inodes(struct inode *inode_in, loff_t pos_in,
 	if (IS_SWAPFILE(inode_in) || IS_SWAPFILE(inode_out))
 		return -ETXTBSY;
 
+	if (IS_IOMAP_IMMUTABLE(inode_in) || IS_IOMAP_IMMUTABLE(inode_out))
+		return -ETXTBSY;
+
 	/* Don't reflink dirs, pipes, sockets... */
 	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
 		return -EISDIR;
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index a2d64666cdd4..0535a7f34d2a 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4481,6 +4481,11 @@ xfs_bmapi_write(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
+	/* fail any attempts to mutate data extents */
+	if (IS_IOMAP_IMMUTABLE(VFS_I(ip))
+			&& !(flags & (XFS_BMAPI_METADATA | XFS_BMAPI_ATTRFORK)))
+		return -ETXTBSY;
+
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 
 	XFS_STATS_INC(mp, xs_blk_mapw);
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 93e955262d07..8427b0003c79 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1294,6 +1294,9 @@ xfs_free_file_space(
 
 	trace_xfs_free_file_space(ip);
 
+	if (IS_IOMAP_IMMUTABLE(VFS_I(ip)))
+		return -ETXTBSY;
+
 	error = xfs_qm_dqattach(ip, 0);
 	if (error)
 		return error;
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index e75c40a47b7d..2e64488bc4de 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1755,6 +1755,12 @@ xfs_ioc_swapext(
 		goto out_put_tmp_file;
 	}
 
+	if (IS_IOMAP_IMMUTABLE(file_inode(f.file)) ||
+	    IS_IOMAP_IMMUTABLE(file_inode(tmp.file))) {
+		error = -EINVAL;
+		goto out_put_tmp_file;
+	}
+
 	/*
 	 * We need to ensure that the fds passed in point to XFS inodes
 	 * before we cast and access them as XFS structures as we have no
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6e1fd5d21248..0a254b768855 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1829,6 +1829,7 @@ struct super_operations {
 #else
 #define S_DAX		0	/* Make all the DAX code disappear */
 #endif
+#define S_IOMAP_IMMUTABLE 16384 /* logical-to-physical extent map is fixed */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system
@@ -1867,6 +1868,7 @@ struct super_operations {
 #define IS_AUTOMOUNT(inode)	((inode)->i_flags & S_AUTOMOUNT)
 #define IS_NOSEC(inode)		((inode)->i_flags & S_NOSEC)
 #define IS_DAX(inode)		((inode)->i_flags & S_DAX)
+#define IS_IOMAP_IMMUTABLE(inode) ((inode)->i_flags & S_IOMAP_IMMUTABLE)
 
 #define IS_WHITEOUT(inode)	(S_ISCHR(inode->i_mode) && \
 				 (inode)->i_rdev == WHITEOUT_DEV)
diff --git a/mm/filemap.c b/mm/filemap.c
index a49702445ce0..a4105a4c1d69 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2806,6 +2806,11 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from)
 	if (unlikely(pos >= inode->i_sb->s_maxbytes))
 		return -EFBIG;
 
+	/* Are we about to mutate the block map on an immutable file? */
+	if (IS_IOMAP_IMMUTABLE(inode)
+			&& (pos + iov_iter_count(from) > i_size_read(inode)))
+		return -ETXTBSY;
+
 	iov_iter_truncate(from, inode->i_sb->s_maxbytes - pos);
 	return iov_iter_count(from);
 }

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/6] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP
  2017-08-11  6:39 [PATCH v3 0/6] fs, xfs: block map immutable files Dan Williams
  2017-08-11  6:39 ` [PATCH v3 1/6] fs, xfs: introduce S_IOMAP_IMMUTABLE Dan Williams
@ 2017-08-11  6:39 ` Dan Williams
  2017-08-11 23:27   ` Dave Chinner
  2017-08-11  6:39 ` [PATCH v3 3/6] fs, xfs: introduce FALLOC_FL_UNSEAL_BLOCK_MAP Dan Williams
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Dan Williams @ 2017-08-11  6:39 UTC (permalink / raw)
  To: darrick.wong
  Cc: Jan Kara, linux-nvdimm, linux-api, Dave Chinner, linux-kernel,
	linux-xfs, Alexander Viro, luto, linux-fsdevel,
	Christoph Hellwig

>>From falloc.h:

    FALLOC_FL_SEAL_BLOCK_MAP is used to seal (make immutable) all of the
    file logical-to-physical extent offset mappings in the file. The
    purpose is to allow an application to assume that there are no holes
    or shared extents in the file and that the metadata needed to find
    all the physical extents of the file is stable and can never be
    dirtied.

For now this patch only permits setting the in-memory state of
S_IOMAP_IMMMUTABLE. Support for clearing and persisting the state is
saved for later patches.

The implementation is careful to not allow the immutable state to change
while any process might have any established mappings. It reuses the
existing xfs_reflink_unshare() and xfs_alloc_file_space() to unshare
extents and fill all holes in the file. It then holds XFS_ILOCK_EXCL
while it validates the file is in the proper state and sets
S_IOMAP_IMMUTABLE.

Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Suggested-by: Dave Chinner <david@fromorbit.com>
Suggested-by: "Darrick J. Wong" <darrick.wong@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/open.c                   |   16 ++++++++--
 fs/xfs/xfs_bmap_util.c      |   72 +++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_bmap_util.h      |    2 +
 fs/xfs/xfs_file.c           |   14 ++++++--
 include/linux/falloc.h      |    3 +-
 include/uapi/linux/falloc.h |   17 ++++++++++
 6 files changed, 117 insertions(+), 7 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 7395860d7164..76f57f7465c4 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -273,6 +273,17 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	    (mode & ~(FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE)))
 		return -EINVAL;
 
+	/*
+	 * Seal block map operation should only be used exclusively, and
+	 * with the IMMUTABLE capability.
+	 */
+	if (mode & FALLOC_FL_SEAL_BLOCK_MAP) {
+		if (!capable(CAP_LINUX_IMMUTABLE))
+			return -EPERM;
+		if (mode & ~FALLOC_FL_SEAL_BLOCK_MAP)
+			return -EINVAL;
+	}
+
 	if (!(file->f_mode & FMODE_WRITE))
 		return -EBADF;
 
@@ -292,9 +303,10 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 		return -ETXTBSY;
 
 	/*
-	 * We cannot allow any allocation changes on an iomap immutable file
+	 * We cannot allow any allocation changes on an iomap immutable file,
+	 * but we can allow the fs to validate if this request is redundant.
 	 */
-	if (IS_IOMAP_IMMUTABLE(inode))
+	if (IS_IOMAP_IMMUTABLE(inode) && !(mode & FALLOC_FL_SEAL_BLOCK_MAP))
 		return -ETXTBSY;
 
 	/*
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 8427b0003c79..2ac8f4ed5723 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1390,6 +1390,78 @@ xfs_zero_file_space(
 
 }
 
+int
+xfs_seal_file_space(
+	struct xfs_inode	*ip,
+	xfs_off_t		offset,
+	xfs_off_t		len)
+{
+	struct inode		*inode = VFS_I(ip);
+	struct address_space	*mapping = inode->i_mapping;
+	struct xfs_mount	*mp = ip->i_mount;
+	uint			blksize;
+	int			error;
+
+	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL));
+
+	if (offset)
+		return -EINVAL;
+
+	/* Before we unshare + allocate, validate if there is any work to do. */
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	error = -EINVAL;
+	if (len != i_size_read(inode))
+		goto out_unlock;
+
+	error = 0;
+	if (IS_IOMAP_IMMUTABLE(inode))
+		goto out_unlock;
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+
+	error = xfs_reflink_unshare(ip, offset, len);
+	if (error)
+		return error;
+
+	blksize = 1 << mp->m_sb.sb_blocklog;
+	error = xfs_alloc_file_space(ip, round_down(offset, blksize),
+			round_up(offset + len, blksize) -
+			round_down(offset, blksize),
+			XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO);
+	if (error)
+		return error;
+
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	/*
+	 * Did we race a size change? Note that since we hold the mmap
+	 * and i/o locks we cannot race hole punch.
+	 */
+	error = -EINVAL;
+	if (len != i_size_read(inode))
+		goto out_unlock;
+
+	/*
+	 * Allow DAX path to assume that the state of S_IOMAP_IMMUTABLE
+	 * will never change while any mapping is established.
+	 */
+	error = -EBUSY;
+	if (mapping_mapped(mapping))
+		goto out_unlock;
+
+	/* Did we race someone attempting to share extents? */
+	if (xfs_is_reflink_inode(ip))
+		goto out_unlock;
+
+	error = 0;
+	inode->i_flags |= S_IOMAP_IMMUTABLE;
+
+out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	return error;
+}
+
 /*
  * @next_fsb will keep track of the extent currently undergoing shift.
  * @stop_fsb will keep track of the extent at which we have to stop.
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index 0cede1043571..5115a32a2483 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -60,6 +60,8 @@ int	xfs_collapse_file_space(struct xfs_inode *, xfs_off_t offset,
 				xfs_off_t len);
 int	xfs_insert_file_space(struct xfs_inode *, xfs_off_t offset,
 				xfs_off_t len);
+int	xfs_seal_file_space(struct xfs_inode *, xfs_off_t offset,
+				xfs_off_t len);
 
 /* EOF block manipulation functions */
 bool	xfs_can_free_eofblocks(struct xfs_inode *ip, bool force);
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index c4893e226fd8..e21121530a90 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -739,7 +739,8 @@ xfs_file_write_iter(
 #define	XFS_FALLOC_FL_SUPPORTED						\
 		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
 		 FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |	\
-		 FALLOC_FL_INSERT_RANGE | FALLOC_FL_UNSHARE_RANGE)
+		 FALLOC_FL_INSERT_RANGE | FALLOC_FL_UNSHARE_RANGE |	\
+		 FALLOC_FL_SEAL_BLOCK_MAP)
 
 STATIC long
 xfs_file_fallocate(
@@ -834,9 +835,14 @@ xfs_file_fallocate(
 				error = xfs_reflink_unshare(ip, offset, len);
 				if (error)
 					goto out_unlock;
-			}
-			error = xfs_alloc_file_space(ip, offset, len,
-						     XFS_BMAPI_PREALLOC);
+
+				error = xfs_alloc_file_space(ip, offset, len,
+						XFS_BMAPI_PREALLOC);
+			} else if (mode & FALLOC_FL_SEAL_BLOCK_MAP) {
+				error = xfs_seal_file_space(ip, offset, len);
+			} else
+				error = xfs_alloc_file_space(ip, offset, len,
+						XFS_BMAPI_PREALLOC);
 		}
 		if (error)
 			goto out_unlock;
diff --git a/include/linux/falloc.h b/include/linux/falloc.h
index 7494dc67c66f..48546c6fbec7 100644
--- a/include/linux/falloc.h
+++ b/include/linux/falloc.h
@@ -26,6 +26,7 @@ struct space_resv {
 					 FALLOC_FL_COLLAPSE_RANGE |	\
 					 FALLOC_FL_ZERO_RANGE |		\
 					 FALLOC_FL_INSERT_RANGE |	\
-					 FALLOC_FL_UNSHARE_RANGE)
+					 FALLOC_FL_UNSHARE_RANGE |	\
+					 FALLOC_FL_SEAL_BLOCK_MAP)
 
 #endif /* _FALLOC_H_ */
diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
index b075f601919b..e3867cfe31d5 100644
--- a/include/uapi/linux/falloc.h
+++ b/include/uapi/linux/falloc.h
@@ -76,4 +76,21 @@
  */
 #define FALLOC_FL_UNSHARE_RANGE		0x40
 
+/*
+ * FALLOC_FL_SEAL_BLOCK_MAP is used to seal (make immutable) all of the
+ * file logical-to-physical extent offset mappings in the file. The
+ * purpose is to allow an application to assume that there are no holes
+ * or shared extents in the file and that the metadata needed to find
+ * all the physical extents of the file is stable and can never be
+ * dirtied.
+ *
+ * The immutable property is in effect for the entire inode, so the
+ * range for this operation must start at offset 0 and len must be
+ * greater than or equal to the current size of the file. If greater,
+ * this operation allocates, unshares, and seals in one atomic step.
+ *
+ * This flag implies FALLOC_FL_UNSHARE_RANGE and as such cannot be used
+ * with the punch, zero, collapse, or insert range modes.
+ */
+#define FALLOC_FL_SEAL_BLOCK_MAP	0x080
 #endif /* _UAPI_FALLOC_H_ */

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/6] fs, xfs: introduce FALLOC_FL_UNSEAL_BLOCK_MAP
  2017-08-11  6:39 [PATCH v3 0/6] fs, xfs: block map immutable files Dan Williams
  2017-08-11  6:39 ` [PATCH v3 1/6] fs, xfs: introduce S_IOMAP_IMMUTABLE Dan Williams
  2017-08-11  6:39 ` [PATCH v3 2/6] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP Dan Williams
@ 2017-08-11  6:39 ` Dan Williams
  2017-08-11  6:39 ` [PATCH v3 4/6] xfs: introduce XFS_DIFLAG2_IOMAP_IMMUTABLE Dan Williams
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2017-08-11  6:39 UTC (permalink / raw)
  To: darrick.wong
  Cc: Jan Kara, linux-nvdimm, linux-api, Dave Chinner, linux-kernel,
	linux-xfs, Alexander Viro, luto, linux-fsdevel,
	Christoph Hellwig

Provide an explicit fallocate operation type for clearing the
S_IOMAP_IMMUTABLE flag. Like the enable case it requires CAP_IMMUTABLE
and it can only be performed while no process has the file mapped.

Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/open.c                   |   20 ++++++++++++------
 fs/xfs/xfs_bmap_util.c      |   47 +++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_bmap_util.h      |    3 +++
 fs/xfs/xfs_file.c           |    4 +++-
 include/linux/falloc.h      |    3 ++-
 include/uapi/linux/falloc.h |    1 +
 6 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 76f57f7465c4..3075599f1c55 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -274,13 +274,17 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 		return -EINVAL;
 
 	/*
-	 * Seal block map operation should only be used exclusively, and
-	 * with the IMMUTABLE capability.
+	 * Seal/unseal block map operations should only be used
+	 * exclusively, and with the IMMUTABLE capability.
 	 */
-	if (mode & FALLOC_FL_SEAL_BLOCK_MAP) {
+	if (mode & (FALLOC_FL_SEAL_BLOCK_MAP | FALLOC_FL_UNSEAL_BLOCK_MAP)) {
 		if (!capable(CAP_LINUX_IMMUTABLE))
 			return -EPERM;
-		if (mode & ~FALLOC_FL_SEAL_BLOCK_MAP)
+		if (mode == (FALLOC_FL_SEAL_BLOCK_MAP
+					| FALLOC_FL_UNSEAL_BLOCK_MAP))
+			return -EINVAL;
+		if (mode & ~(FALLOC_FL_SEAL_BLOCK_MAP
+					| FALLOC_FL_UNSEAL_BLOCK_MAP))
 			return -EINVAL;
 	}
 
@@ -303,10 +307,12 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 		return -ETXTBSY;
 
 	/*
-	 * We cannot allow any allocation changes on an iomap immutable file,
-	 * but we can allow the fs to validate if this request is redundant.
+	 * We cannot allow any allocation changes on an iomap immutable
+	 * file, but we can allow the fs to validate if this request is
+	 * redundant, or unseal the block map.
 	 */
-	if (IS_IOMAP_IMMUTABLE(inode) && !(mode & FALLOC_FL_SEAL_BLOCK_MAP))
+	if (IS_IOMAP_IMMUTABLE(inode) && !(mode & (FALLOC_FL_SEAL_BLOCK_MAP
+					| FALLOC_FL_UNSEAL_BLOCK_MAP)))
 		return -ETXTBSY;
 
 	/*
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 2ac8f4ed5723..888bae801961 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1462,6 +1462,53 @@ xfs_seal_file_space(
 	return error;
 }
 
+int
+xfs_unseal_file_space(
+	struct xfs_inode	*ip,
+	xfs_off_t		offset,
+	xfs_off_t		len)
+{
+	struct inode		*inode = VFS_I(ip);
+	struct address_space	*mapping = inode->i_mapping;
+	int			error;
+
+	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL));
+
+	if (offset)
+		return -EINVAL;
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	/*
+	 * It does not make sense to unseal less than the full range of
+	 * the file.
+	 */
+	error = -EINVAL;
+	if (len != i_size_read(inode))
+		goto out_unlock;
+
+	/* Are we already unsealed? */
+	error = 0;
+	if (!IS_IOMAP_IMMUTABLE(inode))
+		goto out_unlock;
+
+	/*
+	 * Provide safety against one thread changing the policy of not
+	 * requiring fsync/msync (for block allocations) behind another
+	 * thread's back.
+	 */
+	error = -EBUSY;
+	if (mapping_mapped(mapping))
+		goto out_unlock;
+
+	error = 0;
+	inode->i_flags &= ~S_IOMAP_IMMUTABLE;
+
+out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	return error;
+}
+
 /*
  * @next_fsb will keep track of the extent currently undergoing shift.
  * @stop_fsb will keep track of the extent at which we have to stop.
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index 5115a32a2483..b64653a75942 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -62,6 +62,9 @@ int	xfs_insert_file_space(struct xfs_inode *, xfs_off_t offset,
 				xfs_off_t len);
 int	xfs_seal_file_space(struct xfs_inode *, xfs_off_t offset,
 				xfs_off_t len);
+int	xfs_unseal_file_space(struct xfs_inode *, xfs_off_t offset,
+				xfs_off_t len);
+
 
 /* EOF block manipulation functions */
 bool	xfs_can_free_eofblocks(struct xfs_inode *ip, bool force);
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e21121530a90..833f77700be2 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -740,7 +740,7 @@ xfs_file_write_iter(
 		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
 		 FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |	\
 		 FALLOC_FL_INSERT_RANGE | FALLOC_FL_UNSHARE_RANGE |	\
-		 FALLOC_FL_SEAL_BLOCK_MAP)
+		 FALLOC_FL_SEAL_BLOCK_MAP | FALLOC_FL_UNSEAL_BLOCK_MAP)
 
 STATIC long
 xfs_file_fallocate(
@@ -840,6 +840,8 @@ xfs_file_fallocate(
 						XFS_BMAPI_PREALLOC);
 			} else if (mode & FALLOC_FL_SEAL_BLOCK_MAP) {
 				error = xfs_seal_file_space(ip, offset, len);
+			} else if (mode & FALLOC_FL_UNSEAL_BLOCK_MAP) {
+				error = xfs_unseal_file_space(ip, offset, len);
 			} else
 				error = xfs_alloc_file_space(ip, offset, len,
 						XFS_BMAPI_PREALLOC);
diff --git a/include/linux/falloc.h b/include/linux/falloc.h
index 48546c6fbec7..b22c1368ed1e 100644
--- a/include/linux/falloc.h
+++ b/include/linux/falloc.h
@@ -27,6 +27,7 @@ struct space_resv {
 					 FALLOC_FL_ZERO_RANGE |		\
 					 FALLOC_FL_INSERT_RANGE |	\
 					 FALLOC_FL_UNSHARE_RANGE |	\
-					 FALLOC_FL_SEAL_BLOCK_MAP)
+					 FALLOC_FL_SEAL_BLOCK_MAP |	\
+					 FALLOC_FL_UNSEAL_BLOCK_MAP)
 
 #endif /* _FALLOC_H_ */
diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
index e3867cfe31d5..5509e6216448 100644
--- a/include/uapi/linux/falloc.h
+++ b/include/uapi/linux/falloc.h
@@ -93,4 +93,5 @@
  * with the punch, zero, collapse, or insert range modes.
  */
 #define FALLOC_FL_SEAL_BLOCK_MAP	0x080
+#define FALLOC_FL_UNSEAL_BLOCK_MAP	0x100
 #endif /* _UAPI_FALLOC_H_ */

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 4/6] xfs: introduce XFS_DIFLAG2_IOMAP_IMMUTABLE
  2017-08-11  6:39 [PATCH v3 0/6] fs, xfs: block map immutable files Dan Williams
                   ` (2 preceding siblings ...)
  2017-08-11  6:39 ` [PATCH v3 3/6] fs, xfs: introduce FALLOC_FL_UNSEAL_BLOCK_MAP Dan Williams
@ 2017-08-11  6:39 ` Dan Williams
  2017-08-11  6:39 ` [PATCH v3 5/6] xfs: toggle XFS_DIFLAG2_IOMAP_IMMUTABLE in response to fallocate Dan Williams
  2017-08-11  6:39 ` [PATCH v3 6/6] mm, xfs: protect swapfile contents with immutable + unwritten extents Dan Williams
  5 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2017-08-11  6:39 UTC (permalink / raw)
  To: darrick.wong
  Cc: Jan Kara, linux-nvdimm, linux-api, Dave Chinner, linux-kernel,
	linux-xfs, luto, linux-fsdevel, Christoph Hellwig

Add an on-disk inode flag to record the state of the S_IOMAP_IMMUTABLE
in-memory vfs inode flags. This allows the protections against reflink
and hole punch to be automatically restored on a sub-sequent boot when
the in-memory inode is established.

The FS_XFLAG_IOMAP_IMMUTABLE is introduced to allow xfs_io to read the
state of the flag, but toggling the flag requires going through
fallocate(FALLOC_FL_[UN]SEAL_BLOCK_MAP). Support for toggling this
on-disk state is saved for a later patch.

Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Suggested-by: "Darrick J. Wong" <darrick.wong@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/xfs/libxfs/xfs_format.h |    5 ++++-
 fs/xfs/xfs_inode.c         |    2 ++
 fs/xfs/xfs_ioctl.c         |    1 +
 fs/xfs/xfs_iops.c          |    8 +++++---
 include/uapi/linux/fs.h    |    1 +
 5 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index d4d9bef20c3a..9e720e55776b 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1063,12 +1063,15 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
 #define XFS_DIFLAG2_DAX_BIT	0	/* use DAX for this inode */
 #define XFS_DIFLAG2_REFLINK_BIT	1	/* file's blocks may be shared */
 #define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
+#define XFS_DIFLAG2_IOMAP_IMMUTABLE_BIT 3 /* set S_IOMAP_IMMUTABLE for this inode */
 #define XFS_DIFLAG2_DAX		(1 << XFS_DIFLAG2_DAX_BIT)
 #define XFS_DIFLAG2_REFLINK     (1 << XFS_DIFLAG2_REFLINK_BIT)
 #define XFS_DIFLAG2_COWEXTSIZE  (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
+#define XFS_DIFLAG2_IOMAP_IMMUTABLE (1 << XFS_DIFLAG2_IOMAP_IMMUTABLE_BIT)
 
 #define XFS_DIFLAG2_ANY \
-	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE)
+	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
+	 XFS_DIFLAG2_IOMAP_IMMUTABLE)
 
 /*
  * Inode number format:
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index ceef77c0416a..4ca22e272ce6 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -674,6 +674,8 @@ _xfs_dic2xflags(
 			flags |= FS_XFLAG_DAX;
 		if (di_flags2 & XFS_DIFLAG2_COWEXTSIZE)
 			flags |= FS_XFLAG_COWEXTSIZE;
+		if (di_flags2 & XFS_DIFLAG2_IOMAP_IMMUTABLE)
+			flags |= FS_XFLAG_IOMAP_IMMUTABLE;
 	}
 
 	if (has_attr)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 2e64488bc4de..df2eef0f9d45 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -978,6 +978,7 @@ xfs_set_diflags(
 		return;
 
 	di_flags2 = (ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK);
+	di_flags2 |= (ip->i_d.di_flags2 & XFS_DIFLAG2_IOMAP_IMMUTABLE);
 	if (xflags & FS_XFLAG_DAX)
 		di_flags2 |= XFS_DIFLAG2_DAX;
 	if (xflags & FS_XFLAG_COWEXTSIZE)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 469c9fa4c178..174ef95453f5 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1186,9 +1186,10 @@ xfs_diflags_to_iflags(
 	struct xfs_inode	*ip)
 {
 	uint16_t		flags = ip->i_d.di_flags;
+	uint64_t		flags2 = ip->i_d.di_flags2;
 
 	inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC |
-			    S_NOATIME | S_DAX);
+			    S_NOATIME | S_DAX | S_IOMAP_IMMUTABLE);
 
 	if (flags & XFS_DIFLAG_IMMUTABLE)
 		inode->i_flags |= S_IMMUTABLE;
@@ -1201,9 +1202,10 @@ xfs_diflags_to_iflags(
 	if (S_ISREG(inode->i_mode) &&
 	    ip->i_mount->m_sb.sb_blocksize == PAGE_SIZE &&
 	    !xfs_is_reflink_inode(ip) &&
-	    (ip->i_mount->m_flags & XFS_MOUNT_DAX ||
-	     ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
+	    (ip->i_mount->m_flags & XFS_MOUNT_DAX || flags2 & XFS_DIFLAG2_DAX))
 		inode->i_flags |= S_DAX;
+	if (flags2 & XFS_DIFLAG2_IOMAP_IMMUTABLE)
+		inode->i_flags |= S_IOMAP_IMMUTABLE;
 }
 
 /*
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index b7495d05e8de..4765e024ad74 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -182,6 +182,7 @@ struct fsxattr {
 #define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
 #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
 #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
+#define FS_XFLAG_IOMAP_IMMUTABLE 0x00020000	/* block map immutable */
 #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
 
 /* the read-only stuff doesn't really belong here, but any other place is

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 5/6] xfs: toggle XFS_DIFLAG2_IOMAP_IMMUTABLE in response to fallocate
  2017-08-11  6:39 [PATCH v3 0/6] fs, xfs: block map immutable files Dan Williams
                   ` (3 preceding siblings ...)
  2017-08-11  6:39 ` [PATCH v3 4/6] xfs: introduce XFS_DIFLAG2_IOMAP_IMMUTABLE Dan Williams
@ 2017-08-11  6:39 ` Dan Williams
  2017-08-11  6:39 ` [PATCH v3 6/6] mm, xfs: protect swapfile contents with immutable + unwritten extents Dan Williams
  5 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2017-08-11  6:39 UTC (permalink / raw)
  To: darrick.wong
  Cc: Jan Kara, linux-nvdimm, linux-api, Dave Chinner, linux-kernel,
	linux-xfs, luto, linux-fsdevel, Christoph Hellwig

After validating the state of the file as not having holes, shared
extents, or active mappings try to commit the
XFS_DIFLAG2_IOMAP_IMMUTABLE flag to the on-disk inode metadata. If that
succeeds then allow the S_IOMAP_IMMUTABLE to be set on the vfs inode.

Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Suggested-by: "Darrick J. Wong" <darrick.wong@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/xfs/xfs_bmap_util.c |   38 +++++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 888bae801961..26289d553760 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1401,6 +1401,7 @@ xfs_seal_file_space(
 	struct xfs_mount	*mp = ip->i_mount;
 	uint			blksize;
 	int			error;
+	struct xfs_trans	*tp;
 
 	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL));
 
@@ -1431,6 +1432,10 @@ xfs_seal_file_space(
 	if (error)
 		return error;
 
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp);
+	if (error)
+		return error;
+
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	/*
@@ -1439,7 +1444,7 @@ xfs_seal_file_space(
 	 */
 	error = -EINVAL;
 	if (len != i_size_read(inode))
-		goto out_unlock;
+		goto out_cancel;
 
 	/*
 	 * Allow DAX path to assume that the state of S_IOMAP_IMMUTABLE
@@ -1447,15 +1452,20 @@ xfs_seal_file_space(
 	 */
 	error = -EBUSY;
 	if (mapping_mapped(mapping))
-		goto out_unlock;
+		goto out_cancel;
 
 	/* Did we race someone attempting to share extents? */
 	if (xfs_is_reflink_inode(ip))
-		goto out_unlock;
+		goto out_cancel;
 
-	error = 0;
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+	ip->i_d.di_flags2 |= XFS_DIFLAG2_IOMAP_IMMUTABLE;
 	inode->i_flags |= S_IOMAP_IMMUTABLE;
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	return xfs_trans_commit(tp);
 
+out_cancel:
+	xfs_trans_cancel(tp);
 out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 
@@ -1468,15 +1478,21 @@ xfs_unseal_file_space(
 	xfs_off_t		offset,
 	xfs_off_t		len)
 {
+	struct xfs_mount	*mp = ip->i_mount;
 	struct inode		*inode = VFS_I(ip);
 	struct address_space	*mapping = inode->i_mapping;
 	int			error;
+	struct xfs_trans	*tp;
 
 	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL));
 
 	if (offset)
 		return -EINVAL;
 
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp);
+	if (error)
+		return error;
+
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	/*
 	 * It does not make sense to unseal less than the full range of
@@ -1484,12 +1500,12 @@ xfs_unseal_file_space(
 	 */
 	error = -EINVAL;
 	if (len != i_size_read(inode))
-		goto out_unlock;
+		goto out_cancel;
 
 	/* Are we already unsealed? */
 	error = 0;
 	if (!IS_IOMAP_IMMUTABLE(inode))
-		goto out_unlock;
+		goto out_cancel;
 
 	/*
 	 * Provide safety against one thread changing the policy of not
@@ -1498,12 +1514,16 @@ xfs_unseal_file_space(
 	 */
 	error = -EBUSY;
 	if (mapping_mapped(mapping))
-		goto out_unlock;
+		goto out_cancel;
 
-	error = 0;
+	xfs_trans_ijoin(tp, ip, 0);
+	ip->i_d.di_flags2 &= ~XFS_DIFLAG2_IOMAP_IMMUTABLE;
 	inode->i_flags &= ~S_IOMAP_IMMUTABLE;
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	return xfs_trans_commit(tp);
 
-out_unlock:
+out_cancel:
+	xfs_trans_cancel(tp);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 
 	return error;

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 6/6] mm, xfs: protect swapfile contents with immutable + unwritten extents
  2017-08-11  6:39 [PATCH v3 0/6] fs, xfs: block map immutable files Dan Williams
                   ` (4 preceding siblings ...)
  2017-08-11  6:39 ` [PATCH v3 5/6] xfs: toggle XFS_DIFLAG2_IOMAP_IMMUTABLE in response to fallocate Dan Williams
@ 2017-08-11  6:39 ` Dan Williams
  2017-08-11 23:33   ` Dave Chinner
  5 siblings, 1 reply; 13+ messages in thread
From: Dan Williams @ 2017-08-11  6:39 UTC (permalink / raw)
  To: darrick.wong
  Cc: linux-nvdimm, linux-api, Trond Myklebust, Dave Chinner,
	linux-kernel, linux-xfs, linux-mm, luto, linux-fsdevel,
	Andrew Morton, Anna Schumaker

    On Jun 22, 2017, Darrick wrote:
    > On Jun 22, 2017, Dave wrote:
    >> Hmmm, I disagree on the unwritten state here.  We want swap files to
    >> be able to use unwritten extents - it means we can preallocate the
    >> swap file and hand it straight to swapon without having to zero it
    >> (i.e. no I/O needed to demand allocate more swap space when memory
    >> is very low).  Also, anyone who tries to read the swap file from
    >> userspace will be reading unwritten extents, which will always
    >> return zeros rather than whatever is in the swap file...
    >
    > Now I've twisted all the way around to thinking that swap files
    > should be /totally/ unwritten, except for the file header. :)

This explicitly requires swapon(8) to be modified to seal the block-map
of the file before activating swap.

We could seal and activate swap in one step, but that's likely to
surprise legacy userspace that does not expect the file to take on
iomap-immutable semantics in response to swapon(2). However, a potential
follow on is a new flag to swapon(2) that specifies sealing the
block-map at ->swap_activate() time.

Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Suggested-by: "Darrick J. Wong" <darrick.wong@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/nfs/file.c            |    7 +++++-
 fs/xfs/libxfs/xfs_bmap.c |    3 ++-
 fs/xfs/libxfs/xfs_bmap.h |   12 +++++++++-
 fs/xfs/xfs_aops.c        |   54 ++++++++++++++++++++++++++++++++++++++++++++++
 mm/page_io.c             |    1 +
 mm/swapfile.c            |   20 ++++++-----------
 6 files changed, 81 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 5713eb32a45e..a786161f8580 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -489,10 +489,15 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 						sector_t *span)
 {
 	struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host);
+	int rc;
 
 	*span = sis->pages;
 
-	return rpc_clnt_swap_activate(clnt);
+	rc = rpc_clnt_swap_activate(clnt);
+	if (rc)
+		return rc;
+	sis->flags |= SWP_FILE;
+	return add_swap_extent(sis, 0, sis->max, 0);
 }
 
 static void nfs_swap_deactivate(struct file *file)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 0535a7f34d2a..3e2e604a6642 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4483,7 +4483,8 @@ xfs_bmapi_write(
 
 	/* fail any attempts to mutate data extents */
 	if (IS_IOMAP_IMMUTABLE(VFS_I(ip))
-			&& !(flags & (XFS_BMAPI_METADATA | XFS_BMAPI_ATTRFORK)))
+			&& !(flags & (XFS_BMAPI_METADATA | XFS_BMAPI_ATTRFORK
+					| XFS_BMAPI_FORCE)))
 		return -ETXTBSY;
 
 	ifp = XFS_IFORK_PTR(ip, whichfork);
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 851982a5dfbc..a0f099289520 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -113,6 +113,15 @@ struct xfs_extent_free_item
 /* Only convert delalloc space, don't allocate entirely new extents */
 #define XFS_BMAPI_DELALLOC	0x400
 
+/*
+ * Permit extent manipulations even if S_IOMAP_IMMUTABLE is set on the
+ * inode. This is only expected to be used in the swapfile activation
+ * case where we want to mark all swap space as unwritten so that reads
+ * return zero and writes fail with ETXTBSY. Storage access in this
+ * state can only occur via swap operations.
+ */
+#define XFS_BMAPI_FORCE		0x800
+
 #define XFS_BMAPI_FLAGS \
 	{ XFS_BMAPI_ENTIRE,	"ENTIRE" }, \
 	{ XFS_BMAPI_METADATA,	"METADATA" }, \
@@ -124,7 +133,8 @@ struct xfs_extent_free_item
 	{ XFS_BMAPI_ZERO,	"ZERO" }, \
 	{ XFS_BMAPI_REMAP,	"REMAP" }, \
 	{ XFS_BMAPI_COWFORK,	"COWFORK" }, \
-	{ XFS_BMAPI_DELALLOC,	"DELALLOC" }
+	{ XFS_BMAPI_DELALLOC,	"DELALLOC" }, \
+	{ XFS_BMAPI_FORCE,	"FORCE" }
 
 
 static inline int xfs_bmapi_aflag(int w)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 6bf120bb1a17..066708175168 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1418,6 +1418,58 @@ xfs_vm_set_page_dirty(
 	return newly_dirty;
 }
 
+STATIC void
+xfs_vm_swap_deactivate(
+	struct file		*file)
+{
+	struct inode		*inode = file->f_mapping->host;
+	struct xfs_inode	*ip = XFS_I(inode);
+	int			error = 0;
+
+	xfs_ilock(ip, XFS_IOLOCK_EXCL);
+	if (IS_IOMAP_IMMUTABLE(inode))
+		error = xfs_alloc_file_space(ip, PAGE_SIZE,
+				i_size_read(inode) - PAGE_SIZE,
+				XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO
+				| XFS_BMAPI_FORCE);
+	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+
+	WARN(error, "%s failed to restore block map (%d)\n", __func__, error);
+}
+
+STATIC int
+xfs_vm_swap_activate(
+	struct swap_info_struct	*sis,
+	struct file		*file,
+	sector_t		*span)
+{
+	struct inode		*inode = file->f_mapping->host;
+	struct xfs_inode	*ip = XFS_I(inode);
+	int			error = 0, nr_extents;
+
+	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
+
+	nr_extents = generic_swapfile_activate(sis, file, span);
+	if (nr_extents < 0)
+		return nr_extents;
+
+	/*
+	 * If the file is already immutable take this opportunity to
+	 * mark all extents as unwritten.  This arranges for all reads
+	 * to return 0 and all writes to fail with ETXTBSY since they
+	 * would attempt extent conversion to the 'written' state. The
+	 * swap header (PAGE_SIZE) is left alone.
+	 */
+	if (IS_IOMAP_IMMUTABLE(inode))
+		error = xfs_alloc_file_space(ip, PAGE_SIZE,
+				i_size_read(inode) - PAGE_SIZE,
+				XFS_BMAPI_PREALLOC | XFS_BMAPI_CONVERT
+				| XFS_BMAPI_FORCE);
+	if (error)
+		nr_extents = error;
+	return nr_extents;
+}
+
 const struct address_space_operations xfs_address_space_operations = {
 	.readpage		= xfs_vm_readpage,
 	.readpages		= xfs_vm_readpages,
@@ -1427,6 +1479,8 @@ const struct address_space_operations xfs_address_space_operations = {
 	.releasepage		= xfs_vm_releasepage,
 	.invalidatepage		= xfs_vm_invalidatepage,
 	.bmap			= xfs_vm_bmap,
+	.swap_activate		= xfs_vm_swap_activate,
+	.swap_deactivate	= xfs_vm_swap_deactivate,
 	.direct_IO		= xfs_vm_direct_IO,
 	.migratepage		= buffer_migrate_page,
 	.is_partially_uptodate  = block_is_partially_uptodate,
diff --git a/mm/page_io.c b/mm/page_io.c
index b6c4ac388209..301e4f778ebf 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -231,6 +231,7 @@ int generic_swapfile_activate(struct swap_info_struct *sis,
 	ret = -EINVAL;
 	goto out;
 }
+EXPORT_SYMBOL_GPL(generic_swapfile_activate);
 
 /*
  * We may have stale swap cache pages in memory: notice
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 6ba4aab2db0b..6d43a392757f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2105,6 +2105,9 @@ sector_t map_swap_page(struct page *page, struct block_device **bdev)
  */
 static void destroy_swap_extents(struct swap_info_struct *sis)
 {
+	struct file *swap_file = sis->swap_file;
+	struct address_space *mapping = swap_file->f_mapping;
+
 	while (!list_empty(&sis->first_swap_extent.list)) {
 		struct swap_extent *se;
 
@@ -2114,10 +2117,7 @@ static void destroy_swap_extents(struct swap_info_struct *sis)
 		kfree(se);
 	}
 
-	if (sis->flags & SWP_FILE) {
-		struct file *swap_file = sis->swap_file;
-		struct address_space *mapping = swap_file->f_mapping;
-
+	if (mapping->a_ops->swap_deactivate) {
 		sis->flags &= ~SWP_FILE;
 		mapping->a_ops->swap_deactivate(swap_file);
 	}
@@ -2168,6 +2168,7 @@ add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
 	list_add_tail(&new_se->list, &sis->first_swap_extent.list);
 	return 1;
 }
+EXPORT_SYMBOL_GPL(add_swap_extent);
 
 /*
  * A `swap extent' is a simple thing which maps a contiguous range of pages
@@ -2213,15 +2214,8 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)
 		return ret;
 	}
 
-	if (mapping->a_ops->swap_activate) {
-		ret = mapping->a_ops->swap_activate(sis, swap_file, span);
-		if (!ret) {
-			sis->flags |= SWP_FILE;
-			ret = add_swap_extent(sis, 0, sis->max, 0);
-			*span = sis->pages;
-		}
-		return ret;
-	}
+	if (mapping->a_ops->swap_activate)
+		return mapping->a_ops->swap_activate(sis, swap_file, span);
 
 	return generic_swapfile_activate(sis, swap_file, span);
 }

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/6] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP
  2017-08-11  6:39 ` [PATCH v3 2/6] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP Dan Williams
@ 2017-08-11 23:27   ` Dave Chinner
  2017-08-11 23:42     ` Dan Williams
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2017-08-11 23:27 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, linux-nvdimm, linux-api, darrick.wong, linux-kernel,
	linux-xfs, Alexander Viro, luto, linux-fsdevel,
	Christoph Hellwig

On Thu, Aug 10, 2017 at 11:39:28PM -0700, Dan Williams wrote:
> >From falloc.h:
> 
>     FALLOC_FL_SEAL_BLOCK_MAP is used to seal (make immutable) all of the
>     file logical-to-physical extent offset mappings in the file. The
>     purpose is to allow an application to assume that there are no holes
>     or shared extents in the file and that the metadata needed to find
>     all the physical extents of the file is stable and can never be
>     dirtied.
> 
> For now this patch only permits setting the in-memory state of
> S_IOMAP_IMMMUTABLE. Support for clearing and persisting the state is
> saved for later patches.
> 
> The implementation is careful to not allow the immutable state to change
> while any process might have any established mappings. It reuses the
> existing xfs_reflink_unshare() and xfs_alloc_file_space() to unshare
> extents and fill all holes in the file. It then holds XFS_ILOCK_EXCL
> while it validates the file is in the proper state and sets
> S_IOMAP_IMMUTABLE.

SO I've been thinking about this - I'm thinking that we need to
separate the changes to the extent map from the action of sealing
the extent map.

That is, I have a need to freeze an extent map without any
modification to it at all and breaking all the sharing and filling
all the holes completely screws up the file layout I need to
preserve. i.e. I want to be able to freeze the maps of a pair of
reflinked files so I can use FIEMAP to query only the changed blocks
and then read out the data in the private/changed blocks in the
reflinked file. I only need a temporary seal (if we crash it goes
away), so maybe there's another command modifier flag needed here,
too.

The DAX allocation requirements can be handled by a preallocation
call to filll holes with zeros, followed by an unshare call to break
all the COW mappings, and then the extent map can be sealed. If you
need to check for holes after sealing, SEEK_HOLE will tell you what
you need to know...

My preference really is for each fallocate command to do just one
thing - having the seal operation also modify the extent map
means it's not useful for the use cases where we need the extent map
to remain unmodified....

Thoughts?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 6/6] mm, xfs: protect swapfile contents with immutable + unwritten extents
  2017-08-11  6:39 ` [PATCH v3 6/6] mm, xfs: protect swapfile contents with immutable + unwritten extents Dan Williams
@ 2017-08-11 23:33   ` Dave Chinner
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Chinner @ 2017-08-11 23:33 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, linux-api, Trond Myklebust, linux-kernel,
	Anna Schumaker, linux-xfs, linux-mm, luto, linux-fsdevel,
	Andrew Morton, darrick.wong

On Thu, Aug 10, 2017 at 11:39:49PM -0700, Dan Williams wrote:
>  	ifp = XFS_IFORK_PTR(ip, whichfork);
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index 851982a5dfbc..a0f099289520 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -113,6 +113,15 @@ struct xfs_extent_free_item
>  /* Only convert delalloc space, don't allocate entirely new extents */
>  #define XFS_BMAPI_DELALLOC	0x400
>  
> +/*
> + * Permit extent manipulations even if S_IOMAP_IMMUTABLE is set on the
> + * inode. This is only expected to be used in the swapfile activation
> + * case where we want to mark all swap space as unwritten so that reads
> + * return zero and writes fail with ETXTBSY. Storage access in this
> + * state can only occur via swap operations.
> + */
> +#define XFS_BMAPI_FORCE		0x800

Urk. No. Immutable means immutable.

And, as a matter of policy, we should not be changing the on disk
layout of the swapfile that is provided inside the kernel.  If the
swap file is already allocated as unwritten, great. If not, we
should not force it to be unwritten to be because then if the user
downgrades their kernel the swapfile suddenly can not be used by the
older kernel.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/6] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP
  2017-08-11 23:27   ` Dave Chinner
@ 2017-08-11 23:42     ` Dan Williams
  2017-08-12  0:30       ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Dan Williams @ 2017-08-11 23:42 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, linux-nvdimm, Linux API, Darrick J. Wong, linux-kernel,
	linux-xfs, Alexander Viro, Andy Lutomirski, linux-fsdevel,
	Christoph Hellwig

On Fri, Aug 11, 2017 at 4:27 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Aug 10, 2017 at 11:39:28PM -0700, Dan Williams wrote:
>> >From falloc.h:
>>
>>     FALLOC_FL_SEAL_BLOCK_MAP is used to seal (make immutable) all of the
>>     file logical-to-physical extent offset mappings in the file. The
>>     purpose is to allow an application to assume that there are no holes
>>     or shared extents in the file and that the metadata needed to find
>>     all the physical extents of the file is stable and can never be
>>     dirtied.
>>
>> For now this patch only permits setting the in-memory state of
>> S_IOMAP_IMMMUTABLE. Support for clearing and persisting the state is
>> saved for later patches.
>>
>> The implementation is careful to not allow the immutable state to change
>> while any process might have any established mappings. It reuses the
>> existing xfs_reflink_unshare() and xfs_alloc_file_space() to unshare
>> extents and fill all holes in the file. It then holds XFS_ILOCK_EXCL
>> while it validates the file is in the proper state and sets
>> S_IOMAP_IMMUTABLE.
>
> SO I've been thinking about this - I'm thinking that we need to
> separate the changes to the extent map from the action of sealing
> the extent map.
>
> That is, I have a need to freeze an extent map without any
> modification to it at all and breaking all the sharing and filling
> all the holes completely screws up the file layout I need to
> preserve. i.e. I want to be able to freeze the maps of a pair of
> reflinked files so I can use FIEMAP to query only the changed blocks
> and then read out the data in the private/changed blocks in the
> reflinked file. I only need a temporary seal (if we crash it goes
> away), so maybe there's another command modifier flag needed here,
> too.
>
> The DAX allocation requirements can be handled by a preallocation
> call to filll holes with zeros, followed by an unshare call to break
> all the COW mappings, and then the extent map can be sealed. If you
> need to check for holes after sealing, SEEK_HOLE will tell you what
> you need to know...
>
> My preference really is for each fallocate command to do just one
> thing - having the seal operation also modify the extent map
> means it's not useful for the use cases where we need the extent map
> to remain unmodified....
>
> Thoughts?

That does seem to better follow the principle of least surprise and
make the interface more composable. However, for the DAX case do we
now end up needing a SEEK_SHARED, or something similar to check that
we sealed the file without shared extents?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/6] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP
  2017-08-11 23:42     ` Dan Williams
@ 2017-08-12  0:30       ` Dave Chinner
  2017-08-12  2:31         ` Darrick J. Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2017-08-12  0:30 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, linux-nvdimm, Linux API, Darrick J. Wong, linux-kernel,
	linux-xfs, Alexander Viro, Andy Lutomirski, linux-fsdevel,
	Christoph Hellwig

On Fri, Aug 11, 2017 at 04:42:18PM -0700, Dan Williams wrote:
> On Fri, Aug 11, 2017 at 4:27 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, Aug 10, 2017 at 11:39:28PM -0700, Dan Williams wrote:
> >> >From falloc.h:
> >>
> >>     FALLOC_FL_SEAL_BLOCK_MAP is used to seal (make immutable) all of the
> >>     file logical-to-physical extent offset mappings in the file. The
> >>     purpose is to allow an application to assume that there are no holes
> >>     or shared extents in the file and that the metadata needed to find
> >>     all the physical extents of the file is stable and can never be
> >>     dirtied.
> >>
> >> For now this patch only permits setting the in-memory state of
> >> S_IOMAP_IMMMUTABLE. Support for clearing and persisting the state is
> >> saved for later patches.
> >>
> >> The implementation is careful to not allow the immutable state to change
> >> while any process might have any established mappings. It reuses the
> >> existing xfs_reflink_unshare() and xfs_alloc_file_space() to unshare
> >> extents and fill all holes in the file. It then holds XFS_ILOCK_EXCL
> >> while it validates the file is in the proper state and sets
> >> S_IOMAP_IMMUTABLE.
> >
> > SO I've been thinking about this - I'm thinking that we need to
> > separate the changes to the extent map from the action of sealing
> > the extent map.
> >
> > That is, I have a need to freeze an extent map without any
> > modification to it at all and breaking all the sharing and filling
> > all the holes completely screws up the file layout I need to
> > preserve. i.e. I want to be able to freeze the maps of a pair of
> > reflinked files so I can use FIEMAP to query only the changed blocks
> > and then read out the data in the private/changed blocks in the
> > reflinked file. I only need a temporary seal (if we crash it goes
> > away), so maybe there's another command modifier flag needed here,
> > too.
> >
> > The DAX allocation requirements can be handled by a preallocation
> > call to filll holes with zeros, followed by an unshare call to break
> > all the COW mappings, and then the extent map can be sealed. If you
> > need to check for holes after sealing, SEEK_HOLE will tell you what
> > you need to know...
> >
> > My preference really is for each fallocate command to do just one
> > thing - having the seal operation also modify the extent map
> > means it's not useful for the use cases where we need the extent map
> > to remain unmodified....
> >
> > Thoughts?
> 
> That does seem to better follow the principle of least surprise and
> make the interface more composable. However, for the DAX case do we
> now end up needing a SEEK_SHARED, or something similar to check that
> we sealed the file without shared extents?

Don't we have an inode attribute flag for that? There's definitely a
flag in the on disk XFS inode to indicate that there are shared
extents on the file.

Hmmmm - for some reason it's not exposed in FS_IOC_FSGETXATTR.
Darrick? Any reason we don't expose the "this file has shared
extents" flag to userspace? How are we checking that on-disk state
in xfstests?

As it is, if we're adding an immutable extent flag, we've got to be
able to query the immutable extent state of a file from userspace so
I'm thinking that we'd need to expose it via the same interface that
exposes the immutable flag. i.e. we could add the "shared extents
present" flag to FS_IOC_FSGETXATTR at the same time...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/6] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP
  2017-08-12  0:30       ` Dave Chinner
@ 2017-08-12  2:31         ` Darrick J. Wong
  2017-08-12  4:06           ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2017-08-12  2:31 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, linux-nvdimm, Linux API, linux-kernel,
	Christoph Hellwig, linux-xfs, Alexander Viro, Andy Lutomirski,
	linux-fsdevel

On Sat, Aug 12, 2017 at 10:30:34AM +1000, Dave Chinner wrote:
> On Fri, Aug 11, 2017 at 04:42:18PM -0700, Dan Williams wrote:
> > On Fri, Aug 11, 2017 at 4:27 PM, Dave Chinner <david@fromorbit.com> wrote:
> > > On Thu, Aug 10, 2017 at 11:39:28PM -0700, Dan Williams wrote:
> > >> >From falloc.h:
> > >>
> > >>     FALLOC_FL_SEAL_BLOCK_MAP is used to seal (make immutable) all of the
> > >>     file logical-to-physical extent offset mappings in the file. The
> > >>     purpose is to allow an application to assume that there are no holes
> > >>     or shared extents in the file and that the metadata needed to find
> > >>     all the physical extents of the file is stable and can never be
> > >>     dirtied.
> > >>
> > >> For now this patch only permits setting the in-memory state of
> > >> S_IOMAP_IMMMUTABLE. Support for clearing and persisting the state is
> > >> saved for later patches.
> > >>
> > >> The implementation is careful to not allow the immutable state to change
> > >> while any process might have any established mappings. It reuses the
> > >> existing xfs_reflink_unshare() and xfs_alloc_file_space() to unshare
> > >> extents and fill all holes in the file. It then holds XFS_ILOCK_EXCL
> > >> while it validates the file is in the proper state and sets
> > >> S_IOMAP_IMMUTABLE.
> > >
> > > SO I've been thinking about this - I'm thinking that we need to
> > > separate the changes to the extent map from the action of sealing
> > > the extent map.
> > >
> > > That is, I have a need to freeze an extent map without any
> > > modification to it at all and breaking all the sharing and filling
> > > all the holes completely screws up the file layout I need to
> > > preserve. i.e. I want to be able to freeze the maps of a pair of
> > > reflinked files so I can use FIEMAP to query only the changed blocks
> > > and then read out the data in the private/changed blocks in the
> > > reflinked file. I only need a temporary seal (if we crash it goes
> > > away), so maybe there's another command modifier flag needed here,
> > > too.
> > >
> > > The DAX allocation requirements can be handled by a preallocation
> > > call to filll holes with zeros, followed by an unshare call to break
> > > all the COW mappings, and then the extent map can be sealed. If you
> > > need to check for holes after sealing, SEEK_HOLE will tell you what
> > > you need to know...
> > >
> > > My preference really is for each fallocate command to do just one
> > > thing - having the seal operation also modify the extent map
> > > means it's not useful for the use cases where we need the extent map
> > > to remain unmodified....
> > >
> > > Thoughts?
> > 
> > That does seem to better follow the principle of least surprise and
> > make the interface more composable. However, for the DAX case do we
> > now end up needing a SEEK_SHARED, or something similar to check that
> > we sealed the file without shared extents?

Well, fiemap/bmap will tell you (presumably after you confirm that the
file got sealed or whatever) if there are shared extents.

> Don't we have an inode attribute flag for that? There's definitely a
> flag in the on disk XFS inode to indicate that there are shared
> extents on the file.
> 
> Hmmmm - for some reason it's not exposed in FS_IOC_FSGETXATTR.
> Darrick? Any reason we don't expose the "this file has shared
> extents" flag to userspace? How are we checking that on-disk state
> in xfstests?
> 
> As it is, if we're adding an immutable extent flag, we've got to be
> able to query the immutable extent state of a file from userspace so
> I'm thinking that we'd need to expose it via the same interface that
> exposes the immutable flag. i.e. we could add the "shared extents
> present" flag to FS_IOC_FSGETXATTR at the same time...

We used to have a FSGETXATTR flag; iirc hch nak'd it during the review.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/6] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP
  2017-08-12  2:31         ` Darrick J. Wong
@ 2017-08-12  4:06           ` Dave Chinner
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Chinner @ 2017-08-12  4:06 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jan Kara, linux-nvdimm, Linux API, linux-kernel,
	Christoph Hellwig, linux-xfs, Alexander Viro, Andy Lutomirski,
	linux-fsdevel

On Fri, Aug 11, 2017 at 07:31:54PM -0700, Darrick J. Wong wrote:
> On Sat, Aug 12, 2017 at 10:30:34AM +1000, Dave Chinner wrote:
> > On Fri, Aug 11, 2017 at 04:42:18PM -0700, Dan Williams wrote:
> > > On Fri, Aug 11, 2017 at 4:27 PM, Dave Chinner <david@fromorbit.com> wrote:
> > > > On Thu, Aug 10, 2017 at 11:39:28PM -0700, Dan Williams wrote:
> > > >> >From falloc.h:
> > > >>
> > > >>     FALLOC_FL_SEAL_BLOCK_MAP is used to seal (make immutable) all of the
> > > >>     file logical-to-physical extent offset mappings in the file. The
> > > >>     purpose is to allow an application to assume that there are no holes
> > > >>     or shared extents in the file and that the metadata needed to find
> > > >>     all the physical extents of the file is stable and can never be
> > > >>     dirtied.
> > > >>
> > > >> For now this patch only permits setting the in-memory state of
> > > >> S_IOMAP_IMMMUTABLE. Support for clearing and persisting the state is
> > > >> saved for later patches.
> > > >>
> > > >> The implementation is careful to not allow the immutable state to change
> > > >> while any process might have any established mappings. It reuses the
> > > >> existing xfs_reflink_unshare() and xfs_alloc_file_space() to unshare
> > > >> extents and fill all holes in the file. It then holds XFS_ILOCK_EXCL
> > > >> while it validates the file is in the proper state and sets
> > > >> S_IOMAP_IMMUTABLE.
> > > >
> > > > SO I've been thinking about this - I'm thinking that we need to
> > > > separate the changes to the extent map from the action of sealing
> > > > the extent map.
> > > >
> > > > That is, I have a need to freeze an extent map without any
> > > > modification to it at all and breaking all the sharing and filling
> > > > all the holes completely screws up the file layout I need to
> > > > preserve. i.e. I want to be able to freeze the maps of a pair of
> > > > reflinked files so I can use FIEMAP to query only the changed blocks
> > > > and then read out the data in the private/changed blocks in the
> > > > reflinked file. I only need a temporary seal (if we crash it goes
> > > > away), so maybe there's another command modifier flag needed here,
> > > > too.
> > > >
> > > > The DAX allocation requirements can be handled by a preallocation
> > > > call to filll holes with zeros, followed by an unshare call to break
> > > > all the COW mappings, and then the extent map can be sealed. If you
> > > > need to check for holes after sealing, SEEK_HOLE will tell you what
> > > > you need to know...
> > > >
> > > > My preference really is for each fallocate command to do just one
> > > > thing - having the seal operation also modify the extent map
> > > > means it's not useful for the use cases where we need the extent map
> > > > to remain unmodified....
> > > >
> > > > Thoughts?
> > > 
> > > That does seem to better follow the principle of least surprise and
> > > make the interface more composable. However, for the DAX case do we
> > > now end up needing a SEEK_SHARED, or something similar to check that
> > > we sealed the file without shared extents?
> 
> Well, fiemap/bmap will tell you (presumably after you confirm that the
> file got sealed or whatever) if there are shared extents.

Iterating potentially hundreds of thousands of extents just to find
the one shared extent seems like crazy thing to ask people to do.

> > Don't we have an inode attribute flag for that? There's definitely a
> > flag in the on disk XFS inode to indicate that there are shared
> > extents on the file.
> > 
> > Hmmmm - for some reason it's not exposed in FS_IOC_FSGETXATTR.
> > Darrick? Any reason we don't expose the "this file has shared
> > extents" flag to userspace? How are we checking that on-disk state
> > in xfstests?
> > 
> > As it is, if we're adding an immutable extent flag, we've got to be
> > able to query the immutable extent state of a file from userspace so
> > I'm thinking that we'd need to expose it via the same interface that
> > exposes the immutable flag. i.e. we could add the "shared extents
> > present" flag to FS_IOC_FSGETXATTR at the same time...
> 
> We used to have a FSGETXATTR flag; iirc hch nak'd it during the review.

Ok, got a link? Seems strange to not be able to query this state
even for testing/validation purposes...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-08-12  4:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-11  6:39 [PATCH v3 0/6] fs, xfs: block map immutable files Dan Williams
2017-08-11  6:39 ` [PATCH v3 1/6] fs, xfs: introduce S_IOMAP_IMMUTABLE Dan Williams
2017-08-11  6:39 ` [PATCH v3 2/6] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP Dan Williams
2017-08-11 23:27   ` Dave Chinner
2017-08-11 23:42     ` Dan Williams
2017-08-12  0:30       ` Dave Chinner
2017-08-12  2:31         ` Darrick J. Wong
2017-08-12  4:06           ` Dave Chinner
2017-08-11  6:39 ` [PATCH v3 3/6] fs, xfs: introduce FALLOC_FL_UNSEAL_BLOCK_MAP Dan Williams
2017-08-11  6:39 ` [PATCH v3 4/6] xfs: introduce XFS_DIFLAG2_IOMAP_IMMUTABLE Dan Williams
2017-08-11  6:39 ` [PATCH v3 5/6] xfs: toggle XFS_DIFLAG2_IOMAP_IMMUTABLE in response to fallocate Dan Williams
2017-08-11  6:39 ` [PATCH v3 6/6] mm, xfs: protect swapfile contents with immutable + unwritten extents Dan Williams
2017-08-11 23:33   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).