All of lore.kernel.org
 help / color / mirror / Atom feed
* fix locking for the reflink operation
@ 2016-10-17 12:05 Christoph Hellwig
  2016-10-17 12:05 ` [PATCH 1/4] xfs: remove the same fs check from xfs_file_share_range Christoph Hellwig
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Christoph Hellwig @ 2016-10-17 12:05 UTC (permalink / raw)
  To: linux-xfs; +Cc: darrick.wong

When creating a reflink we need to take the iolock much earlier, as
various early checks done in xfs_file_share_range currently are racy
without it.  Patches 1-3 sort that out in a minimal invasive way,
but I think we should just merge xfs_file_share_range and
xfs_reflink_remap_range, which is what patch 4 does.

Patches 1-3 are something I'd like to see in 4.9, patch 4 might not
fully qualify, but just getting it in might make everyones life easier.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] xfs: remove the same fs check from xfs_file_share_range
  2016-10-17 12:05 fix locking for the reflink operation Christoph Hellwig
@ 2016-10-17 12:05 ` Christoph Hellwig
  2016-10-17 21:11   ` Darrick J. Wong
  2016-10-17 12:05 ` [PATCH 2/4] xfs: fix the same_inode check in xfs_file_share_range Christoph Hellwig
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2016-10-17 12:05 UTC (permalink / raw)
  To: linux-xfs; +Cc: darrick.wong

The VFS already does the check, and the placement of this duplicate is in
the way of the following locking rework.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a314fc7..20563f2 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1003,9 +1003,6 @@ xfs_file_share_range(
 	    IS_SWAPFILE(inode_out))
 		return -ETXTBSY;
 
-	/* Reflink only works within this filesystem. */
-	if (inode_in->i_sb != inode_out->i_sb)
-		return -EXDEV;
 	same_inode = (inode_in->i_ino == inode_out->i_ino);
 
 	/* Don't reflink dirs, pipes, sockets... */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] xfs: fix the same_inode check in xfs_file_share_range
  2016-10-17 12:05 fix locking for the reflink operation Christoph Hellwig
  2016-10-17 12:05 ` [PATCH 1/4] xfs: remove the same fs check from xfs_file_share_range Christoph Hellwig
@ 2016-10-17 12:05 ` Christoph Hellwig
  2016-10-17 21:12   ` Darrick J. Wong
  2016-10-17 12:05 ` [PATCH 3/4] xfs: move inode locking from xfs_reflink_remap_range to xfs_file_share_range Christoph Hellwig
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2016-10-17 12:05 UTC (permalink / raw)
  To: linux-xfs; +Cc: darrick.wong

The VFS i_ino is an unsigned long, while XFS inode numbers are 64-bit
wide, so checking i_ino for equality could lead to rate false positives
on 32-bit architectures.  Just compare the inode pointers themselves
to be safe.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 20563f2..012a960 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1003,7 +1003,7 @@ xfs_file_share_range(
 	    IS_SWAPFILE(inode_out))
 		return -ETXTBSY;
 
-	same_inode = (inode_in->i_ino == inode_out->i_ino);
+	same_inode = (inode_in == inode_out);
 
 	/* Don't reflink dirs, pipes, sockets... */
 	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] xfs: move inode locking from xfs_reflink_remap_range to xfs_file_share_range
  2016-10-17 12:05 fix locking for the reflink operation Christoph Hellwig
  2016-10-17 12:05 ` [PATCH 1/4] xfs: remove the same fs check from xfs_file_share_range Christoph Hellwig
  2016-10-17 12:05 ` [PATCH 2/4] xfs: fix the same_inode check in xfs_file_share_range Christoph Hellwig
@ 2016-10-17 12:05 ` Christoph Hellwig
  2016-10-17 21:16   ` Darrick J. Wong
  2016-10-17 12:05 ` [PATCH 4/4] xfs: merge xfs_reflink_remap_range and xfs_file_share_range Christoph Hellwig
  2016-10-17 21:35 ` fix locking for the reflink operation Darrick J. Wong
  4 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2016-10-17 12:05 UTC (permalink / raw)
  To: linux-xfs; +Cc: darrick.wong

We need the iolock protection to stabilizie the IS_SWAPFILE and IS_IMMUTABLE
values, as well as preventing new buffered writers re-dirtying the file data
that we just wrote out.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c    | 62 ++++++++++++++++++++++++++++++++++------------------
 fs/xfs/xfs_reflink.c | 15 -------------
 2 files changed, 41 insertions(+), 36 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 012a960..0960264 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -996,38 +996,54 @@ xfs_file_share_range(
 	inode_out = file_inode(file_out);
 	bs = inode_out->i_sb->s_blocksize;
 
+	/* Lock both files against IO */
+	same_inode = (inode_in == inode_out);
+	if (same_inode) {
+		xfs_ilock(XFS_I(inode_in), XFS_IOLOCK_EXCL);
+		xfs_ilock(XFS_I(inode_in), XFS_MMAPLOCK_EXCL);
+	} else {
+		xfs_lock_two_inodes(XFS_I(inode_in), XFS_I(inode_out),
+				XFS_IOLOCK_EXCL);
+		xfs_lock_two_inodes(XFS_I(inode_in), XFS_I(inode_out),
+				XFS_MMAPLOCK_EXCL);
+	}
+
 	/* Don't touch certain kinds of inodes */
+	ret = -EPERM;
 	if (IS_IMMUTABLE(inode_out))
-		return -EPERM;
-	if (IS_SWAPFILE(inode_in) ||
-	    IS_SWAPFILE(inode_out))
-		return -ETXTBSY;
-
-	same_inode = (inode_in == inode_out);
+		goto out_unlock;
+	ret = -ETXTBSY;
+	if (IS_SWAPFILE(inode_in) || IS_SWAPFILE(inode_out))
+		goto out_unlock;
 
 	/* Don't reflink dirs, pipes, sockets... */
+	ret = -EISDIR;
 	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
-		return -EISDIR;
+		goto out_unlock;
+	ret = -EINVAL;
 	if (S_ISFIFO(inode_in->i_mode) || S_ISFIFO(inode_out->i_mode))
-		return -EINVAL;
+		goto out_unlock;
 	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
-		return -EINVAL;
+		goto out_unlock;
 
 	/* Don't share DAX file data for now. */
 	if (IS_DAX(inode_in) || IS_DAX(inode_out))
-		return -EINVAL;
+		goto out_unlock;
 
 	/* Are we going all the way to the end? */
 	isize = i_size_read(inode_in);
-	if (isize == 0)
-		return 0;
+	if (isize == 0) {
+		ret = 0;
+		goto out_unlock;
+	}
+
 	if (len == 0)
 		len = isize - pos_in;
 
 	/* Ensure offsets don't wrap and the input is inside i_size */
 	if (pos_in + len < pos_in || pos_out + len < pos_out ||
 	    pos_in + len > isize)
-		return -EINVAL;
+		goto out_unlock;
 
 	/* Don't allow dedupe past EOF in the dest file */
 	if (is_dedupe) {
@@ -1035,7 +1051,7 @@ xfs_file_share_range(
 
 		disize = i_size_read(inode_out);
 		if (pos_out >= disize || pos_out + len > disize)
-			return -EINVAL;
+			goto out_unlock;
 	}
 
 	/* If we're linking to EOF, continue to the block boundary. */
@@ -1047,28 +1063,32 @@ xfs_file_share_range(
 	/* Only reflink if we're aligned to block boundaries */
 	if (!IS_ALIGNED(pos_in, bs) || !IS_ALIGNED(pos_in + blen, bs) ||
 	    !IS_ALIGNED(pos_out, bs) || !IS_ALIGNED(pos_out + blen, bs))
-		return -EINVAL;
+		goto out_unlock;
 
 	/* Don't allow overlapped reflink within the same file */
 	if (same_inode && pos_out + blen > pos_in && pos_out < pos_in + blen)
-		return -EINVAL;
+		goto out_unlock;
 
 	/* Wait for the completion of any pending IOs on srcfile */
 	ret = xfs_file_wait_for_io(inode_in, pos_in, len);
 	if (ret)
-		goto out;
+		goto out_unlock;
 	ret = xfs_file_wait_for_io(inode_out, pos_out, len);
 	if (ret)
-		goto out;
+		goto out_unlock;
 
 	if (is_dedupe)
 		flags |= XFS_REFLINK_DEDUPE;
 	ret = xfs_reflink_remap_range(XFS_I(inode_in), pos_in, XFS_I(inode_out),
 			pos_out, len, flags);
-	if (ret < 0)
-		goto out;
 
-out:
+out_unlock:
+	xfs_iunlock(XFS_I(inode_in), XFS_MMAPLOCK_EXCL);
+	xfs_iunlock(XFS_I(inode_in), XFS_IOLOCK_EXCL);
+	if (!same_inode) {
+		xfs_iunlock(XFS_I(inode_out), XFS_MMAPLOCK_EXCL);
+		xfs_iunlock(XFS_I(inode_out), XFS_IOLOCK_EXCL);
+	}
 	return ret;
 }
 
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 5965e94..d012746 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1341,15 +1341,6 @@ xfs_reflink_remap_range(
 
 	trace_xfs_reflink_remap_range(src, srcoff, len, dest, destoff);
 
-	/* Lock both files against IO */
-	if (src->i_ino == dest->i_ino) {
-		xfs_ilock(src, XFS_IOLOCK_EXCL);
-		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
-	} else {
-		xfs_lock_two_inodes(src, dest, XFS_IOLOCK_EXCL);
-		xfs_lock_two_inodes(src, dest, XFS_MMAPLOCK_EXCL);
-	}
-
 	/*
 	 * Check that the extents are the same.
 	 */
@@ -1401,12 +1392,6 @@ xfs_reflink_remap_range(
 		goto out_error;
 
 out_error:
-	xfs_iunlock(src, XFS_MMAPLOCK_EXCL);
-	xfs_iunlock(src, XFS_IOLOCK_EXCL);
-	if (src->i_ino != dest->i_ino) {
-		xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
-		xfs_iunlock(dest, XFS_IOLOCK_EXCL);
-	}
 	if (error)
 		trace_xfs_reflink_remap_range_error(dest, error, _RET_IP_);
 	return error;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] xfs: merge xfs_reflink_remap_range and xfs_file_share_range
  2016-10-17 12:05 fix locking for the reflink operation Christoph Hellwig
                   ` (2 preceding siblings ...)
  2016-10-17 12:05 ` [PATCH 3/4] xfs: move inode locking from xfs_reflink_remap_range to xfs_file_share_range Christoph Hellwig
@ 2016-10-17 12:05 ` Christoph Hellwig
  2016-10-17 21:29   ` Darrick J. Wong
  2016-10-17 21:35 ` fix locking for the reflink operation Darrick J. Wong
  4 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2016-10-17 12:05 UTC (permalink / raw)
  To: linux-xfs; +Cc: darrick.wong

There is no clear division of responsibility between those functions, so
just merge them into one to keep the code simple.  Also move
xfs_file_wait_for_io to xfs_reflink.c together with its only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c    | 145 -------------------------------------
 fs/xfs/xfs_reflink.c | 199 ++++++++++++++++++++++++++++++++++++++++-----------
 fs/xfs/xfs_reflink.h |   8 +--
 3 files changed, 161 insertions(+), 191 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 0960264..cd37573 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -947,151 +947,6 @@ xfs_file_fallocate(
 	return error;
 }
 
-/*
- * Flush all file writes out to disk.
- */
-static int
-xfs_file_wait_for_io(
-	struct inode	*inode,
-	loff_t		offset,
-	size_t		len)
-{
-	loff_t		rounding;
-	loff_t		ioffset;
-	loff_t		iendoffset;
-	loff_t		bs;
-	int		ret;
-
-	bs = inode->i_sb->s_blocksize;
-	inode_dio_wait(inode);
-
-	rounding = max_t(xfs_off_t, bs, PAGE_SIZE);
-	ioffset = round_down(offset, rounding);
-	iendoffset = round_up(offset + len, rounding) - 1;
-	ret = filemap_write_and_wait_range(inode->i_mapping, ioffset,
-					   iendoffset);
-	return ret;
-}
-
-/* Hook up to the VFS reflink function */
-STATIC int
-xfs_file_share_range(
-	struct file	*file_in,
-	loff_t		pos_in,
-	struct file	*file_out,
-	loff_t		pos_out,
-	u64		len,
-	bool		is_dedupe)
-{
-	struct inode	*inode_in;
-	struct inode	*inode_out;
-	ssize_t		ret;
-	loff_t		bs;
-	loff_t		isize;
-	int		same_inode;
-	loff_t		blen;
-	unsigned int	flags = 0;
-
-	inode_in = file_inode(file_in);
-	inode_out = file_inode(file_out);
-	bs = inode_out->i_sb->s_blocksize;
-
-	/* Lock both files against IO */
-	same_inode = (inode_in == inode_out);
-	if (same_inode) {
-		xfs_ilock(XFS_I(inode_in), XFS_IOLOCK_EXCL);
-		xfs_ilock(XFS_I(inode_in), XFS_MMAPLOCK_EXCL);
-	} else {
-		xfs_lock_two_inodes(XFS_I(inode_in), XFS_I(inode_out),
-				XFS_IOLOCK_EXCL);
-		xfs_lock_two_inodes(XFS_I(inode_in), XFS_I(inode_out),
-				XFS_MMAPLOCK_EXCL);
-	}
-
-	/* Don't touch certain kinds of inodes */
-	ret = -EPERM;
-	if (IS_IMMUTABLE(inode_out))
-		goto out_unlock;
-	ret = -ETXTBSY;
-	if (IS_SWAPFILE(inode_in) || IS_SWAPFILE(inode_out))
-		goto out_unlock;
-
-	/* Don't reflink dirs, pipes, sockets... */
-	ret = -EISDIR;
-	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
-		goto out_unlock;
-	ret = -EINVAL;
-	if (S_ISFIFO(inode_in->i_mode) || S_ISFIFO(inode_out->i_mode))
-		goto out_unlock;
-	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
-		goto out_unlock;
-
-	/* Don't share DAX file data for now. */
-	if (IS_DAX(inode_in) || IS_DAX(inode_out))
-		goto out_unlock;
-
-	/* Are we going all the way to the end? */
-	isize = i_size_read(inode_in);
-	if (isize == 0) {
-		ret = 0;
-		goto out_unlock;
-	}
-
-	if (len == 0)
-		len = isize - pos_in;
-
-	/* Ensure offsets don't wrap and the input is inside i_size */
-	if (pos_in + len < pos_in || pos_out + len < pos_out ||
-	    pos_in + len > isize)
-		goto out_unlock;
-
-	/* Don't allow dedupe past EOF in the dest file */
-	if (is_dedupe) {
-		loff_t	disize;
-
-		disize = i_size_read(inode_out);
-		if (pos_out >= disize || pos_out + len > disize)
-			goto out_unlock;
-	}
-
-	/* If we're linking to EOF, continue to the block boundary. */
-	if (pos_in + len == isize)
-		blen = ALIGN(isize, bs) - pos_in;
-	else
-		blen = len;
-
-	/* Only reflink if we're aligned to block boundaries */
-	if (!IS_ALIGNED(pos_in, bs) || !IS_ALIGNED(pos_in + blen, bs) ||
-	    !IS_ALIGNED(pos_out, bs) || !IS_ALIGNED(pos_out + blen, bs))
-		goto out_unlock;
-
-	/* Don't allow overlapped reflink within the same file */
-	if (same_inode && pos_out + blen > pos_in && pos_out < pos_in + blen)
-		goto out_unlock;
-
-	/* Wait for the completion of any pending IOs on srcfile */
-	ret = xfs_file_wait_for_io(inode_in, pos_in, len);
-	if (ret)
-		goto out_unlock;
-	ret = xfs_file_wait_for_io(inode_out, pos_out, len);
-	if (ret)
-		goto out_unlock;
-
-	if (is_dedupe)
-		flags |= XFS_REFLINK_DEDUPE;
-	ret = xfs_reflink_remap_range(XFS_I(inode_in), pos_in, XFS_I(inode_out),
-			pos_out, len, flags);
-
-out_unlock:
-	xfs_iunlock(XFS_I(inode_in), XFS_MMAPLOCK_EXCL);
-	xfs_iunlock(XFS_I(inode_in), XFS_IOLOCK_EXCL);
-	if (!same_inode) {
-		xfs_iunlock(XFS_I(inode_out), XFS_MMAPLOCK_EXCL);
-		xfs_iunlock(XFS_I(inode_out), XFS_IOLOCK_EXCL);
-	}
-	return ret;
-}
-
 STATIC ssize_t
 xfs_file_copy_range(
 	struct file	*file_in,
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index d012746..11ed2ad 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1308,23 +1308,56 @@ xfs_compare_extents(
 }
 
 /*
+ * Flush all file writes out to disk.
+ */
+static int
+xfs_file_wait_for_io(
+	struct inode	*inode,
+	loff_t		offset,
+	size_t		len)
+{
+	loff_t		rounding;
+	loff_t		ioffset;
+	loff_t		iendoffset;
+	loff_t		bs;
+	int		ret;
+
+	bs = inode->i_sb->s_blocksize;
+	inode_dio_wait(inode);
+
+	rounding = max_t(xfs_off_t, bs, PAGE_SIZE);
+	ioffset = round_down(offset, rounding);
+	iendoffset = round_up(offset + len, rounding) - 1;
+	ret = filemap_write_and_wait_range(inode->i_mapping, ioffset,
+					   iendoffset);
+	return ret;
+}
+
+/*
  * Link a range of blocks from one file to another.
  */
 int
-xfs_reflink_remap_range(
-	struct xfs_inode	*src,
-	xfs_off_t		srcoff,
-	struct xfs_inode	*dest,
-	xfs_off_t		destoff,
-	xfs_off_t		len,
-	unsigned int		flags)
+xfs_file_share_range(
+	struct file		*file_in,
+	loff_t			pos_in,
+	struct file		*file_out,
+	loff_t			pos_out,
+	u64			len,
+	bool			is_dedupe)
 {
+	struct inode		*inode_in = file_inode(file_in);
+	struct xfs_inode	*src = XFS_I(inode_in);
+	struct inode		*inode_out = file_inode(file_out);
+	struct xfs_inode	*dest = XFS_I(inode_out);
 	struct xfs_mount	*mp = src->i_mount;
+	loff_t			bs = inode_out->i_sb->s_blocksize;
+	bool			same_inode = (inode_in == inode_out);
 	xfs_fileoff_t		sfsbno, dfsbno;
 	xfs_filblks_t		fsblen;
-	int			error;
 	xfs_extlen_t		cowextsize;
-	bool			is_same;
+	loff_t			isize;
+	ssize_t			ret;
+	loff_t			blen;
 
 	if (!xfs_sb_version_hasreflink(&mp->m_sb))
 		return -EOPNOTSUPP;
@@ -1332,48 +1365,128 @@ xfs_reflink_remap_range(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
+	/* Lock both files against IO */
+	if (same_inode) {
+		xfs_ilock(src, XFS_IOLOCK_EXCL);
+		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
+	} else {
+		xfs_lock_two_inodes(src, dest, XFS_IOLOCK_EXCL);
+		xfs_lock_two_inodes(src, dest, XFS_MMAPLOCK_EXCL);
+	}
+
+	/* Don't touch certain kinds of inodes */
+	ret = -EPERM;
+	if (IS_IMMUTABLE(inode_out))
+		goto out_unlock;
+
+	ret = -ETXTBSY;
+	if (IS_SWAPFILE(inode_in) || IS_SWAPFILE(inode_out))
+		goto out_unlock;
+
+
+	/* Don't reflink dirs, pipes, sockets... */
+	ret = -EISDIR;
+	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
+		goto out_unlock;
+	ret = -EINVAL;
+	if (S_ISFIFO(inode_in->i_mode) || S_ISFIFO(inode_out->i_mode))
+		goto out_unlock;
+	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
+		goto out_unlock;
+
 	/* Don't reflink realtime inodes */
 	if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest))
-		return -EINVAL;
+		goto out_unlock;
+
+	/* Don't share DAX file data for now. */
+	if (IS_DAX(inode_in) || IS_DAX(inode_out))
+		goto out_unlock;
+
+	/* Are we going all the way to the end? */
+	isize = i_size_read(inode_in);
+	if (isize == 0) {
+		ret = 0;
+		goto out_unlock;
+	}
+
+	if (len == 0)
+		len = isize - pos_in;
 
-	if (flags & ~XFS_REFLINK_ALL)
-		return -EINVAL;
+	/* Ensure offsets don't wrap and the input is inside i_size */
+	if (pos_in + len < pos_in || pos_out + len < pos_out ||
+	    pos_in + len > isize)
+		goto out_unlock;
+
+	/* Don't allow dedupe past EOF in the dest file */
+	if (is_dedupe) {
+		loff_t	disize;
+
+		disize = i_size_read(inode_out);
+		if (pos_out >= disize || pos_out + len > disize)
+			goto out_unlock;
+	}
 
-	trace_xfs_reflink_remap_range(src, srcoff, len, dest, destoff);
+	/* If we're linking to EOF, continue to the block boundary. */
+	if (pos_in + len == isize)
+		blen = ALIGN(isize, bs) - pos_in;
+	else
+		blen = len;
+
+	/* Only reflink if we're aligned to block boundaries */
+	if (!IS_ALIGNED(pos_in, bs) || !IS_ALIGNED(pos_in + blen, bs) ||
+	    !IS_ALIGNED(pos_out, bs) || !IS_ALIGNED(pos_out + blen, bs))
+		goto out_unlock;
+
+	/* Don't allow overlapped reflink within the same file */
+	if (same_inode) {
+		if (pos_out + blen > pos_in && pos_out < pos_in + blen)
+			goto out_unlock;
+	}
+
+	/* Wait for the completion of any pending IOs on srcfile */
+	ret = xfs_file_wait_for_io(inode_in, pos_in, len);
+	if (ret)
+		goto out_unlock;
+	ret = xfs_file_wait_for_io(inode_out, pos_out, len);
+	if (ret)
+		goto out_unlock;
+
+	trace_xfs_reflink_remap_range(src, pos_in, len, dest, pos_out);
 
 	/*
 	 * Check that the extents are the same.
 	 */
-	if (flags & XFS_REFLINK_DEDUPE) {
-		is_same = false;
-		error = xfs_compare_extents(VFS_I(src), srcoff, VFS_I(dest),
-				destoff, len, &is_same);
-		if (error)
-			goto out_error;
+	if (is_dedupe) {
+		bool		is_same = false;
+
+		ret = xfs_compare_extents(inode_in, pos_in, inode_out, pos_out,
+				len, &is_same);
+		if (ret)
+			goto out_unlock;;
 		if (!is_same) {
-			error = -EBADE;
-			goto out_error;
+			ret = -EBADE;
+			goto out_unlock;
 		}
 	}
 
-	error = xfs_reflink_set_inode_flag(src, dest);
-	if (error)
-		goto out_error;
+	ret = xfs_reflink_set_inode_flag(src, dest);
+	if (ret)
+		goto out_unlock;
 
 	/*
 	 * Invalidate the page cache so that we can clear any CoW mappings
 	 * in the destination file.
 	 */
-	truncate_inode_pages_range(&VFS_I(dest)->i_data, destoff,
-				   PAGE_ALIGN(destoff + len) - 1);
+	truncate_inode_pages_range(&inode_out->i_data, pos_out,
+				   PAGE_ALIGN(pos_out + len) - 1);
 
-	dfsbno = XFS_B_TO_FSBT(mp, destoff);
-	sfsbno = XFS_B_TO_FSBT(mp, srcoff);
+	dfsbno = XFS_B_TO_FSBT(mp, pos_out);
+	sfsbno = XFS_B_TO_FSBT(mp, pos_in);
 	fsblen = XFS_B_TO_FSB(mp, len);
-	error = xfs_reflink_remap_blocks(src, sfsbno, dest, dfsbno, fsblen,
-			destoff + len);
-	if (error)
-		goto out_error;
+	ret = xfs_reflink_remap_blocks(src, sfsbno, dest, dfsbno, fsblen,
+			pos_out + len);
+	if (ret)
+		goto out_unlock;
 
 	/*
 	 * Carry the cowextsize hint from src to dest if we're sharing the
@@ -1381,20 +1494,24 @@ xfs_reflink_remap_range(
 	 * has a cowextsize hint, and the destination file does not.
 	 */
 	cowextsize = 0;
-	if (srcoff == 0 && len == i_size_read(VFS_I(src)) &&
+	if (pos_in == 0 && len == i_size_read(inode_in) &&
 	    (src->i_d.di_flags2 & XFS_DIFLAG2_COWEXTSIZE) &&
-	    destoff == 0 && len >= i_size_read(VFS_I(dest)) &&
+	    pos_out == 0 && len >= i_size_read(inode_out) &&
 	    !(dest->i_d.di_flags2 & XFS_DIFLAG2_COWEXTSIZE))
 		cowextsize = src->i_d.di_cowextsize;
 
-	error = xfs_reflink_update_dest(dest, destoff + len, cowextsize);
-	if (error)
-		goto out_error;
+	ret = xfs_reflink_update_dest(dest, pos_out + len, cowextsize);
 
-out_error:
-	if (error)
-		trace_xfs_reflink_remap_range_error(dest, error, _RET_IP_);
-	return error;
+out_unlock:
+	xfs_iunlock(src, XFS_MMAPLOCK_EXCL);
+	xfs_iunlock(src, XFS_IOLOCK_EXCL);
+	if (src->i_ino != dest->i_ino) {
+		xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
+		xfs_iunlock(dest, XFS_IOLOCK_EXCL);
+	}
+	if (ret)
+		trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_);
+	return ret;
 }
 
 /*
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 5dc3c8a..25a8956 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -43,11 +43,6 @@ extern int xfs_reflink_cancel_cow_range(struct xfs_inode *ip, xfs_off_t offset,
 extern int xfs_reflink_end_cow(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t count);
 extern int xfs_reflink_recover_cow(struct xfs_mount *mp);
-#define XFS_REFLINK_DEDUPE	1	/* only reflink if contents match */
-#define XFS_REFLINK_ALL		(XFS_REFLINK_DEDUPE)
-extern int xfs_reflink_remap_range(struct xfs_inode *src, xfs_off_t srcoff,
-		struct xfs_inode *dest, xfs_off_t destoff, xfs_off_t len,
-		unsigned int flags);
 extern int xfs_reflink_clear_inode_flag(struct xfs_inode *ip,
 		struct xfs_trans **tpp);
 extern int xfs_reflink_unshare(struct xfs_inode *ip, xfs_off_t offset,
@@ -55,4 +50,7 @@ extern int xfs_reflink_unshare(struct xfs_inode *ip, xfs_off_t offset,
 
 extern bool xfs_reflink_has_real_cow_blocks(struct xfs_inode *ip);
 
+int xfs_file_share_range(struct file *file_in, loff_t pos_in,
+		struct file *file_out, loff_t pos_out, u64 len, bool is_dedupe);
+
 #endif /* __XFS_REFLINK_H */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] xfs: remove the same fs check from xfs_file_share_range
  2016-10-17 12:05 ` [PATCH 1/4] xfs: remove the same fs check from xfs_file_share_range Christoph Hellwig
@ 2016-10-17 21:11   ` Darrick J. Wong
  0 siblings, 0 replies; 11+ messages in thread
From: Darrick J. Wong @ 2016-10-17 21:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Oct 17, 2016 at 02:05:17PM +0200, Christoph Hellwig wrote:
> The VFS already does the check, and the placement of this duplicate is in
> the way of the following locking rework.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

> ---
>  fs/xfs/xfs_file.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index a314fc7..20563f2 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1003,9 +1003,6 @@ xfs_file_share_range(
>  	    IS_SWAPFILE(inode_out))
>  		return -ETXTBSY;
>  
> -	/* Reflink only works within this filesystem. */
> -	if (inode_in->i_sb != inode_out->i_sb)
> -		return -EXDEV;
>  	same_inode = (inode_in->i_ino == inode_out->i_ino);
>  
>  	/* Don't reflink dirs, pipes, sockets... */
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/4] xfs: fix the same_inode check in xfs_file_share_range
  2016-10-17 12:05 ` [PATCH 2/4] xfs: fix the same_inode check in xfs_file_share_range Christoph Hellwig
@ 2016-10-17 21:12   ` Darrick J. Wong
  0 siblings, 0 replies; 11+ messages in thread
From: Darrick J. Wong @ 2016-10-17 21:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Oct 17, 2016 at 02:05:18PM +0200, Christoph Hellwig wrote:
> The VFS i_ino is an unsigned long, while XFS inode numbers are 64-bit
> wide, so checking i_ino for equality could lead to rate false positives
> on 32-bit architectures.  Just compare the inode pointers themselves
> to be safe.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

> ---
>  fs/xfs/xfs_file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 20563f2..012a960 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1003,7 +1003,7 @@ xfs_file_share_range(
>  	    IS_SWAPFILE(inode_out))
>  		return -ETXTBSY;
>  
> -	same_inode = (inode_in->i_ino == inode_out->i_ino);
> +	same_inode = (inode_in == inode_out);
>  
>  	/* Don't reflink dirs, pipes, sockets... */
>  	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] xfs: move inode locking from xfs_reflink_remap_range to xfs_file_share_range
  2016-10-17 12:05 ` [PATCH 3/4] xfs: move inode locking from xfs_reflink_remap_range to xfs_file_share_range Christoph Hellwig
@ 2016-10-17 21:16   ` Darrick J. Wong
  0 siblings, 0 replies; 11+ messages in thread
From: Darrick J. Wong @ 2016-10-17 21:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Oct 17, 2016 at 02:05:19PM +0200, Christoph Hellwig wrote:
> We need the iolock protection to stabilizie the IS_SWAPFILE and IS_IMMUTABLE
> values, as well as preventing new buffered writers re-dirtying the file data
> that we just wrote out.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

> ---
>  fs/xfs/xfs_file.c    | 62 ++++++++++++++++++++++++++++++++++------------------
>  fs/xfs/xfs_reflink.c | 15 -------------
>  2 files changed, 41 insertions(+), 36 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 012a960..0960264 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -996,38 +996,54 @@ xfs_file_share_range(
>  	inode_out = file_inode(file_out);
>  	bs = inode_out->i_sb->s_blocksize;
>  
> +	/* Lock both files against IO */
> +	same_inode = (inode_in == inode_out);
> +	if (same_inode) {
> +		xfs_ilock(XFS_I(inode_in), XFS_IOLOCK_EXCL);
> +		xfs_ilock(XFS_I(inode_in), XFS_MMAPLOCK_EXCL);
> +	} else {
> +		xfs_lock_two_inodes(XFS_I(inode_in), XFS_I(inode_out),
> +				XFS_IOLOCK_EXCL);
> +		xfs_lock_two_inodes(XFS_I(inode_in), XFS_I(inode_out),
> +				XFS_MMAPLOCK_EXCL);
> +	}
> +
>  	/* Don't touch certain kinds of inodes */
> +	ret = -EPERM;
>  	if (IS_IMMUTABLE(inode_out))
> -		return -EPERM;
> -	if (IS_SWAPFILE(inode_in) ||
> -	    IS_SWAPFILE(inode_out))
> -		return -ETXTBSY;
> -
> -	same_inode = (inode_in == inode_out);
> +		goto out_unlock;
> +	ret = -ETXTBSY;
> +	if (IS_SWAPFILE(inode_in) || IS_SWAPFILE(inode_out))
> +		goto out_unlock;
>  
>  	/* Don't reflink dirs, pipes, sockets... */
> +	ret = -EISDIR;
>  	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> -		return -EISDIR;
> +		goto out_unlock;
> +	ret = -EINVAL;
>  	if (S_ISFIFO(inode_in->i_mode) || S_ISFIFO(inode_out->i_mode))
> -		return -EINVAL;
> +		goto out_unlock;
>  	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> -		return -EINVAL;
> +		goto out_unlock;
>  
>  	/* Don't share DAX file data for now. */
>  	if (IS_DAX(inode_in) || IS_DAX(inode_out))
> -		return -EINVAL;
> +		goto out_unlock;
>  
>  	/* Are we going all the way to the end? */
>  	isize = i_size_read(inode_in);
> -	if (isize == 0)
> -		return 0;
> +	if (isize == 0) {
> +		ret = 0;
> +		goto out_unlock;
> +	}
> +
>  	if (len == 0)
>  		len = isize - pos_in;
>  
>  	/* Ensure offsets don't wrap and the input is inside i_size */
>  	if (pos_in + len < pos_in || pos_out + len < pos_out ||
>  	    pos_in + len > isize)
> -		return -EINVAL;
> +		goto out_unlock;
>  
>  	/* Don't allow dedupe past EOF in the dest file */
>  	if (is_dedupe) {
> @@ -1035,7 +1051,7 @@ xfs_file_share_range(
>  
>  		disize = i_size_read(inode_out);
>  		if (pos_out >= disize || pos_out + len > disize)
> -			return -EINVAL;
> +			goto out_unlock;
>  	}
>  
>  	/* If we're linking to EOF, continue to the block boundary. */
> @@ -1047,28 +1063,32 @@ xfs_file_share_range(
>  	/* Only reflink if we're aligned to block boundaries */
>  	if (!IS_ALIGNED(pos_in, bs) || !IS_ALIGNED(pos_in + blen, bs) ||
>  	    !IS_ALIGNED(pos_out, bs) || !IS_ALIGNED(pos_out + blen, bs))
> -		return -EINVAL;
> +		goto out_unlock;
>  
>  	/* Don't allow overlapped reflink within the same file */
>  	if (same_inode && pos_out + blen > pos_in && pos_out < pos_in + blen)
> -		return -EINVAL;
> +		goto out_unlock;
>  
>  	/* Wait for the completion of any pending IOs on srcfile */
>  	ret = xfs_file_wait_for_io(inode_in, pos_in, len);
>  	if (ret)
> -		goto out;
> +		goto out_unlock;
>  	ret = xfs_file_wait_for_io(inode_out, pos_out, len);
>  	if (ret)
> -		goto out;
> +		goto out_unlock;
>  
>  	if (is_dedupe)
>  		flags |= XFS_REFLINK_DEDUPE;
>  	ret = xfs_reflink_remap_range(XFS_I(inode_in), pos_in, XFS_I(inode_out),
>  			pos_out, len, flags);
> -	if (ret < 0)
> -		goto out;
>  
> -out:
> +out_unlock:
> +	xfs_iunlock(XFS_I(inode_in), XFS_MMAPLOCK_EXCL);
> +	xfs_iunlock(XFS_I(inode_in), XFS_IOLOCK_EXCL);
> +	if (!same_inode) {
> +		xfs_iunlock(XFS_I(inode_out), XFS_MMAPLOCK_EXCL);
> +		xfs_iunlock(XFS_I(inode_out), XFS_IOLOCK_EXCL);
> +	}
>  	return ret;
>  }
>  
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 5965e94..d012746 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1341,15 +1341,6 @@ xfs_reflink_remap_range(
>  
>  	trace_xfs_reflink_remap_range(src, srcoff, len, dest, destoff);
>  
> -	/* Lock both files against IO */
> -	if (src->i_ino == dest->i_ino) {
> -		xfs_ilock(src, XFS_IOLOCK_EXCL);
> -		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
> -	} else {
> -		xfs_lock_two_inodes(src, dest, XFS_IOLOCK_EXCL);
> -		xfs_lock_two_inodes(src, dest, XFS_MMAPLOCK_EXCL);
> -	}
> -
>  	/*
>  	 * Check that the extents are the same.
>  	 */
> @@ -1401,12 +1392,6 @@ xfs_reflink_remap_range(
>  		goto out_error;
>  
>  out_error:
> -	xfs_iunlock(src, XFS_MMAPLOCK_EXCL);
> -	xfs_iunlock(src, XFS_IOLOCK_EXCL);
> -	if (src->i_ino != dest->i_ino) {
> -		xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
> -		xfs_iunlock(dest, XFS_IOLOCK_EXCL);
> -	}
>  	if (error)
>  		trace_xfs_reflink_remap_range_error(dest, error, _RET_IP_);
>  	return error;
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] xfs: merge xfs_reflink_remap_range and xfs_file_share_range
  2016-10-17 12:05 ` [PATCH 4/4] xfs: merge xfs_reflink_remap_range and xfs_file_share_range Christoph Hellwig
@ 2016-10-17 21:29   ` Darrick J. Wong
  2016-10-18  5:17     ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2016-10-17 21:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Oct 17, 2016 at 02:05:20PM +0200, Christoph Hellwig wrote:
> There is no clear division of responsibility between those functions, so
> just merge them into one to keep the code simple.  Also move
> xfs_file_wait_for_io to xfs_reflink.c together with its only caller.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_file.c    | 145 -------------------------------------
>  fs/xfs/xfs_reflink.c | 199 ++++++++++++++++++++++++++++++++++++++++-----------
>  fs/xfs/xfs_reflink.h |   8 +--
>  3 files changed, 161 insertions(+), 191 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 0960264..cd37573 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -947,151 +947,6 @@ xfs_file_fallocate(
>  	return error;
>  }
>  
> -/*
> - * Flush all file writes out to disk.
> - */
> -static int
> -xfs_file_wait_for_io(
> -	struct inode	*inode,
> -	loff_t		offset,
> -	size_t		len)
> -{
> -	loff_t		rounding;
> -	loff_t		ioffset;
> -	loff_t		iendoffset;
> -	loff_t		bs;
> -	int		ret;
> -
> -	bs = inode->i_sb->s_blocksize;
> -	inode_dio_wait(inode);
> -
> -	rounding = max_t(xfs_off_t, bs, PAGE_SIZE);
> -	ioffset = round_down(offset, rounding);
> -	iendoffset = round_up(offset + len, rounding) - 1;
> -	ret = filemap_write_and_wait_range(inode->i_mapping, ioffset,
> -					   iendoffset);
> -	return ret;
> -}
> -
> -/* Hook up to the VFS reflink function */
> -STATIC int
> -xfs_file_share_range(
> -	struct file	*file_in,
> -	loff_t		pos_in,
> -	struct file	*file_out,
> -	loff_t		pos_out,
> -	u64		len,
> -	bool		is_dedupe)
> -{
> -	struct inode	*inode_in;
> -	struct inode	*inode_out;
> -	ssize_t		ret;
> -	loff_t		bs;
> -	loff_t		isize;
> -	int		same_inode;
> -	loff_t		blen;
> -	unsigned int	flags = 0;
> -
> -	inode_in = file_inode(file_in);
> -	inode_out = file_inode(file_out);
> -	bs = inode_out->i_sb->s_blocksize;
> -
> -	/* Lock both files against IO */
> -	same_inode = (inode_in == inode_out);
> -	if (same_inode) {
> -		xfs_ilock(XFS_I(inode_in), XFS_IOLOCK_EXCL);
> -		xfs_ilock(XFS_I(inode_in), XFS_MMAPLOCK_EXCL);
> -	} else {
> -		xfs_lock_two_inodes(XFS_I(inode_in), XFS_I(inode_out),
> -				XFS_IOLOCK_EXCL);
> -		xfs_lock_two_inodes(XFS_I(inode_in), XFS_I(inode_out),
> -				XFS_MMAPLOCK_EXCL);
> -	}
> -
> -	/* Don't touch certain kinds of inodes */
> -	ret = -EPERM;
> -	if (IS_IMMUTABLE(inode_out))
> -		goto out_unlock;
> -	ret = -ETXTBSY;
> -	if (IS_SWAPFILE(inode_in) || IS_SWAPFILE(inode_out))
> -		goto out_unlock;
> -
> -	/* Don't reflink dirs, pipes, sockets... */
> -	ret = -EISDIR;
> -	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> -		goto out_unlock;
> -	ret = -EINVAL;
> -	if (S_ISFIFO(inode_in->i_mode) || S_ISFIFO(inode_out->i_mode))
> -		goto out_unlock;
> -	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> -		goto out_unlock;
> -
> -	/* Don't share DAX file data for now. */
> -	if (IS_DAX(inode_in) || IS_DAX(inode_out))
> -		goto out_unlock;
> -
> -	/* Are we going all the way to the end? */
> -	isize = i_size_read(inode_in);
> -	if (isize == 0) {
> -		ret = 0;
> -		goto out_unlock;
> -	}
> -
> -	if (len == 0)
> -		len = isize - pos_in;
> -
> -	/* Ensure offsets don't wrap and the input is inside i_size */
> -	if (pos_in + len < pos_in || pos_out + len < pos_out ||
> -	    pos_in + len > isize)
> -		goto out_unlock;
> -
> -	/* Don't allow dedupe past EOF in the dest file */
> -	if (is_dedupe) {
> -		loff_t	disize;
> -
> -		disize = i_size_read(inode_out);
> -		if (pos_out >= disize || pos_out + len > disize)
> -			goto out_unlock;
> -	}
> -
> -	/* If we're linking to EOF, continue to the block boundary. */
> -	if (pos_in + len == isize)
> -		blen = ALIGN(isize, bs) - pos_in;
> -	else
> -		blen = len;
> -
> -	/* Only reflink if we're aligned to block boundaries */
> -	if (!IS_ALIGNED(pos_in, bs) || !IS_ALIGNED(pos_in + blen, bs) ||
> -	    !IS_ALIGNED(pos_out, bs) || !IS_ALIGNED(pos_out + blen, bs))
> -		goto out_unlock;
> -
> -	/* Don't allow overlapped reflink within the same file */
> -	if (same_inode && pos_out + blen > pos_in && pos_out < pos_in + blen)
> -		goto out_unlock;
> -
> -	/* Wait for the completion of any pending IOs on srcfile */
> -	ret = xfs_file_wait_for_io(inode_in, pos_in, len);
> -	if (ret)
> -		goto out_unlock;
> -	ret = xfs_file_wait_for_io(inode_out, pos_out, len);
> -	if (ret)
> -		goto out_unlock;
> -
> -	if (is_dedupe)
> -		flags |= XFS_REFLINK_DEDUPE;
> -	ret = xfs_reflink_remap_range(XFS_I(inode_in), pos_in, XFS_I(inode_out),
> -			pos_out, len, flags);
> -
> -out_unlock:
> -	xfs_iunlock(XFS_I(inode_in), XFS_MMAPLOCK_EXCL);
> -	xfs_iunlock(XFS_I(inode_in), XFS_IOLOCK_EXCL);
> -	if (!same_inode) {
> -		xfs_iunlock(XFS_I(inode_out), XFS_MMAPLOCK_EXCL);
> -		xfs_iunlock(XFS_I(inode_out), XFS_IOLOCK_EXCL);
> -	}
> -	return ret;
> -}
> -
>  STATIC ssize_t
>  xfs_file_copy_range(
>  	struct file	*file_in,
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index d012746..11ed2ad 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1308,23 +1308,56 @@ xfs_compare_extents(
>  }
>  
>  /*
> + * Flush all file writes out to disk.
> + */
> +static int
> +xfs_file_wait_for_io(
> +	struct inode	*inode,
> +	loff_t		offset,
> +	size_t		len)
> +{
> +	loff_t		rounding;
> +	loff_t		ioffset;
> +	loff_t		iendoffset;
> +	loff_t		bs;
> +	int		ret;
> +
> +	bs = inode->i_sb->s_blocksize;
> +	inode_dio_wait(inode);
> +
> +	rounding = max_t(xfs_off_t, bs, PAGE_SIZE);
> +	ioffset = round_down(offset, rounding);
> +	iendoffset = round_up(offset + len, rounding) - 1;
> +	ret = filemap_write_and_wait_range(inode->i_mapping, ioffset,
> +					   iendoffset);
> +	return ret;
> +}

This seems like a file action not specific to reflink.

> +
> +/*
>   * Link a range of blocks from one file to another.
>   */
>  int
> -xfs_reflink_remap_range(
> -	struct xfs_inode	*src,
> -	xfs_off_t		srcoff,
> -	struct xfs_inode	*dest,
> -	xfs_off_t		destoff,
> -	xfs_off_t		len,
> -	unsigned int		flags)
> +xfs_file_share_range(

xfs_reflink_share_file_range() ?

I'd like to maintain the convention that all reflink functions
start with xfs_reflink_*, particularly since the xfs_file_* functions
largely live in xfs_file.c.

> +	struct file		*file_in,
> +	loff_t			pos_in,
> +	struct file		*file_out,
> +	loff_t			pos_out,
> +	u64			len,
> +	bool			is_dedupe)
>  {
> +	struct inode		*inode_in = file_inode(file_in);
> +	struct xfs_inode	*src = XFS_I(inode_in);
> +	struct inode		*inode_out = file_inode(file_out);
> +	struct xfs_inode	*dest = XFS_I(inode_out);
>  	struct xfs_mount	*mp = src->i_mount;
> +	loff_t			bs = inode_out->i_sb->s_blocksize;
> +	bool			same_inode = (inode_in == inode_out);
>  	xfs_fileoff_t		sfsbno, dfsbno;
>  	xfs_filblks_t		fsblen;
> -	int			error;
>  	xfs_extlen_t		cowextsize;
> -	bool			is_same;
> +	loff_t			isize;
> +	ssize_t			ret;
> +	loff_t			blen;
>  
>  	if (!xfs_sb_version_hasreflink(&mp->m_sb))
>  		return -EOPNOTSUPP;
> @@ -1332,48 +1365,128 @@ xfs_reflink_remap_range(
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> +	/* Lock both files against IO */
> +	if (same_inode) {
> +		xfs_ilock(src, XFS_IOLOCK_EXCL);
> +		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
> +	} else {
> +		xfs_lock_two_inodes(src, dest, XFS_IOLOCK_EXCL);
> +		xfs_lock_two_inodes(src, dest, XFS_MMAPLOCK_EXCL);
> +	}
> +
> +	/* Don't touch certain kinds of inodes */
> +	ret = -EPERM;
> +	if (IS_IMMUTABLE(inode_out))
> +		goto out_unlock;
> +
> +	ret = -ETXTBSY;
> +	if (IS_SWAPFILE(inode_in) || IS_SWAPFILE(inode_out))
> +		goto out_unlock;
> +
> +
> +	/* Don't reflink dirs, pipes, sockets... */
> +	ret = -EISDIR;
> +	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> +		goto out_unlock;
> +	ret = -EINVAL;
> +	if (S_ISFIFO(inode_in->i_mode) || S_ISFIFO(inode_out->i_mode))
> +		goto out_unlock;
> +	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> +		goto out_unlock;
> +
>  	/* Don't reflink realtime inodes */
>  	if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest))
> -		return -EINVAL;
> +		goto out_unlock;
> +
> +	/* Don't share DAX file data for now. */
> +	if (IS_DAX(inode_in) || IS_DAX(inode_out))
> +		goto out_unlock;
> +
> +	/* Are we going all the way to the end? */
> +	isize = i_size_read(inode_in);
> +	if (isize == 0) {
> +		ret = 0;
> +		goto out_unlock;
> +	}
> +
> +	if (len == 0)
> +		len = isize - pos_in;
>  
> -	if (flags & ~XFS_REFLINK_ALL)
> -		return -EINVAL;
> +	/* Ensure offsets don't wrap and the input is inside i_size */
> +	if (pos_in + len < pos_in || pos_out + len < pos_out ||
> +	    pos_in + len > isize)
> +		goto out_unlock;
> +
> +	/* Don't allow dedupe past EOF in the dest file */
> +	if (is_dedupe) {
> +		loff_t	disize;
> +
> +		disize = i_size_read(inode_out);
> +		if (pos_out >= disize || pos_out + len > disize)
> +			goto out_unlock;
> +	}
>  
> -	trace_xfs_reflink_remap_range(src, srcoff, len, dest, destoff);
> +	/* If we're linking to EOF, continue to the block boundary. */
> +	if (pos_in + len == isize)
> +		blen = ALIGN(isize, bs) - pos_in;
> +	else
> +		blen = len;
> +
> +	/* Only reflink if we're aligned to block boundaries */
> +	if (!IS_ALIGNED(pos_in, bs) || !IS_ALIGNED(pos_in + blen, bs) ||
> +	    !IS_ALIGNED(pos_out, bs) || !IS_ALIGNED(pos_out + blen, bs))
> +		goto out_unlock;
> +
> +	/* Don't allow overlapped reflink within the same file */
> +	if (same_inode) {
> +		if (pos_out + blen > pos_in && pos_out < pos_in + blen)
> +			goto out_unlock;
> +	}
> +
> +	/* Wait for the completion of any pending IOs on srcfile */
> +	ret = xfs_file_wait_for_io(inode_in, pos_in, len);
> +	if (ret)
> +		goto out_unlock;
> +	ret = xfs_file_wait_for_io(inode_out, pos_out, len);
> +	if (ret)
> +		goto out_unlock;
> +
> +	trace_xfs_reflink_remap_range(src, pos_in, len, dest, pos_out);
>  
>  	/*
>  	 * Check that the extents are the same.
>  	 */
> -	if (flags & XFS_REFLINK_DEDUPE) {
> -		is_same = false;
> -		error = xfs_compare_extents(VFS_I(src), srcoff, VFS_I(dest),
> -				destoff, len, &is_same);
> -		if (error)
> -			goto out_error;
> +	if (is_dedupe) {
> +		bool		is_same = false;
> +
> +		ret = xfs_compare_extents(inode_in, pos_in, inode_out, pos_out,
> +				len, &is_same);
> +		if (ret)
> +			goto out_unlock;;

Double-semicolon here.

>  		if (!is_same) {
> -			error = -EBADE;
> -			goto out_error;
> +			ret = -EBADE;
> +			goto out_unlock;
>  		}
>  	}
>  
> -	error = xfs_reflink_set_inode_flag(src, dest);
> -	if (error)
> -		goto out_error;
> +	ret = xfs_reflink_set_inode_flag(src, dest);
> +	if (ret)
> +		goto out_unlock;
>  
>  	/*
>  	 * Invalidate the page cache so that we can clear any CoW mappings
>  	 * in the destination file.
>  	 */
> -	truncate_inode_pages_range(&VFS_I(dest)->i_data, destoff,
> -				   PAGE_ALIGN(destoff + len) - 1);
> +	truncate_inode_pages_range(&inode_out->i_data, pos_out,
> +				   PAGE_ALIGN(pos_out + len) - 1);
>  
> -	dfsbno = XFS_B_TO_FSBT(mp, destoff);
> -	sfsbno = XFS_B_TO_FSBT(mp, srcoff);
> +	dfsbno = XFS_B_TO_FSBT(mp, pos_out);
> +	sfsbno = XFS_B_TO_FSBT(mp, pos_in);
>  	fsblen = XFS_B_TO_FSB(mp, len);
> -	error = xfs_reflink_remap_blocks(src, sfsbno, dest, dfsbno, fsblen,
> -			destoff + len);
> -	if (error)
> -		goto out_error;
> +	ret = xfs_reflink_remap_blocks(src, sfsbno, dest, dfsbno, fsblen,
> +			pos_out + len);
> +	if (ret)
> +		goto out_unlock;
>  
>  	/*
>  	 * Carry the cowextsize hint from src to dest if we're sharing the
> @@ -1381,20 +1494,24 @@ xfs_reflink_remap_range(
>  	 * has a cowextsize hint, and the destination file does not.
>  	 */
>  	cowextsize = 0;
> -	if (srcoff == 0 && len == i_size_read(VFS_I(src)) &&
> +	if (pos_in == 0 && len == i_size_read(inode_in) &&
>  	    (src->i_d.di_flags2 & XFS_DIFLAG2_COWEXTSIZE) &&
> -	    destoff == 0 && len >= i_size_read(VFS_I(dest)) &&
> +	    pos_out == 0 && len >= i_size_read(inode_out) &&
>  	    !(dest->i_d.di_flags2 & XFS_DIFLAG2_COWEXTSIZE))
>  		cowextsize = src->i_d.di_cowextsize;
>  
> -	error = xfs_reflink_update_dest(dest, destoff + len, cowextsize);
> -	if (error)
> -		goto out_error;
> +	ret = xfs_reflink_update_dest(dest, pos_out + len, cowextsize);
>  
> -out_error:
> -	if (error)
> -		trace_xfs_reflink_remap_range_error(dest, error, _RET_IP_);
> -	return error;
> +out_unlock:
> +	xfs_iunlock(src, XFS_MMAPLOCK_EXCL);
> +	xfs_iunlock(src, XFS_IOLOCK_EXCL);
> +	if (src->i_ino != dest->i_ino) {
> +		xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
> +		xfs_iunlock(dest, XFS_IOLOCK_EXCL);
> +	}
> +	if (ret)
> +		trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_);
> +	return ret;
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index 5dc3c8a..25a8956 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -43,11 +43,6 @@ extern int xfs_reflink_cancel_cow_range(struct xfs_inode *ip, xfs_off_t offset,
>  extern int xfs_reflink_end_cow(struct xfs_inode *ip, xfs_off_t offset,
>  		xfs_off_t count);
>  extern int xfs_reflink_recover_cow(struct xfs_mount *mp);
> -#define XFS_REFLINK_DEDUPE	1	/* only reflink if contents match */
> -#define XFS_REFLINK_ALL		(XFS_REFLINK_DEDUPE)

Hooray, uglyflags went away! :)

Excepting the minor things I complained about, the rest is
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> -extern int xfs_reflink_remap_range(struct xfs_inode *src, xfs_off_t srcoff,
> -		struct xfs_inode *dest, xfs_off_t destoff, xfs_off_t len,
> -		unsigned int flags);
>  extern int xfs_reflink_clear_inode_flag(struct xfs_inode *ip,
>  		struct xfs_trans **tpp);
>  extern int xfs_reflink_unshare(struct xfs_inode *ip, xfs_off_t offset,
> @@ -55,4 +50,7 @@ extern int xfs_reflink_unshare(struct xfs_inode *ip, xfs_off_t offset,
>  
>  extern bool xfs_reflink_has_real_cow_blocks(struct xfs_inode *ip);
>  
> +int xfs_file_share_range(struct file *file_in, loff_t pos_in,
> +		struct file *file_out, loff_t pos_out, u64 len, bool is_dedupe);
> +
>  #endif /* __XFS_REFLINK_H */
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: fix locking for the reflink operation
  2016-10-17 12:05 fix locking for the reflink operation Christoph Hellwig
                   ` (3 preceding siblings ...)
  2016-10-17 12:05 ` [PATCH 4/4] xfs: merge xfs_reflink_remap_range and xfs_file_share_range Christoph Hellwig
@ 2016-10-17 21:35 ` Darrick J. Wong
  4 siblings, 0 replies; 11+ messages in thread
From: Darrick J. Wong @ 2016-10-17 21:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Oct 17, 2016 at 02:05:16PM +0200, Christoph Hellwig wrote:
> When creating a reflink we need to take the iolock much earlier, as
> various early checks done in xfs_file_share_range currently are racy
> without it.  Patches 1-3 sort that out in a minimal invasive way,
> but I think we should just merge xfs_file_share_range and
> xfs_reflink_remap_range, which is what patch 4 does.
> 
> Patches 1-3 are something I'd like to see in 4.9, patch 4 might not
> fully qualify, but just getting it in might make everyones life easier.

This series (+ the CoW optimization series before it) seem to run ok here.
I'm ok with (more soak testing and) sending it in for 4.9.

--D

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] xfs: merge xfs_reflink_remap_range and xfs_file_share_range
  2016-10-17 21:29   ` Darrick J. Wong
@ 2016-10-18  5:17     ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2016-10-18  5:17 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Mon, Oct 17, 2016 at 02:29:33PM -0700, Darrick J. Wong wrote:
> > +	bs = inode->i_sb->s_blocksize;
> > +	inode_dio_wait(inode);
> > +
> > +	rounding = max_t(xfs_off_t, bs, PAGE_SIZE);
> > +	ioffset = round_down(offset, rounding);
> > +	iendoffset = round_up(offset + len, rounding) - 1;
> > +	ret = filemap_write_and_wait_range(inode->i_mapping, ioffset,
> > +					   iendoffset);
> > +	return ret;
> > +}
> 
> This seems like a file action not specific to reflink.

But it's only used by the reflink code :)

That being said given that filemap_write_and_wait_range operates
on pages there is no need for the rounding anyway, and we could
just replace it with open coded calls to inode_dio_wait and
filemap_write_and_wait_range.  Maybe I should do that before
this patch so we don't have to bother moving it.

> > +xfs_file_share_range(
> 
> xfs_reflink_share_file_range() ?
> 
> I'd like to maintain the convention that all reflink functions
> start with xfs_reflink_*, particularly since the xfs_file_* functions
> largely live in xfs_file.c.

Ok, fine.

> > +	if (is_dedupe) {
> > +		bool		is_same = false;
> > +
> > +		ret = xfs_compare_extents(inode_in, pos_in, inode_out, pos_out,
> > +				len, &is_same);
> > +		if (ret)
> > +			goto out_unlock;;
> 
> Double-semicolon here.

I'll fix it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-10-18  5:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-17 12:05 fix locking for the reflink operation Christoph Hellwig
2016-10-17 12:05 ` [PATCH 1/4] xfs: remove the same fs check from xfs_file_share_range Christoph Hellwig
2016-10-17 21:11   ` Darrick J. Wong
2016-10-17 12:05 ` [PATCH 2/4] xfs: fix the same_inode check in xfs_file_share_range Christoph Hellwig
2016-10-17 21:12   ` Darrick J. Wong
2016-10-17 12:05 ` [PATCH 3/4] xfs: move inode locking from xfs_reflink_remap_range to xfs_file_share_range Christoph Hellwig
2016-10-17 21:16   ` Darrick J. Wong
2016-10-17 12:05 ` [PATCH 4/4] xfs: merge xfs_reflink_remap_range and xfs_file_share_range Christoph Hellwig
2016-10-17 21:29   ` Darrick J. Wong
2016-10-18  5:17     ` Christoph Hellwig
2016-10-17 21:35 ` fix locking for the reflink operation Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.