All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: david@fromorbit.com, darrick.wong@oracle.com
Cc: sandeen@redhat.com, linux-nfs@vger.kernel.org,
	linux-cifs@vger.kernel.org, linux-unionfs@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	ocfs2-devel@oss.oracle.com, linux-fsdevel@vger.kernel.org,
	linux-btrfs@vger.kernel.org
Subject: [PATCH 05/28] vfs: avoid problematic remapping requests into partial EOF block
Date: Sun, 21 Oct 2018 09:15:37 -0700	[thread overview]
Message-ID: <154013853780.29026.5441191187672186537.stgit@magnolia> (raw)
In-Reply-To: <154013850285.29026.16168387526580596209.stgit@magnolia>

From: Darrick J. Wong <darrick.wong@oracle.com>

A deduplication data corruption is exposed in XFS and btrfs. It is
caused by extending the block match range to include the partial EOF
block, but then allowing unknown data beyond EOF to be considered a
"match" to data in the destination file because the comparison is only
made to the end of the source file. This corrupts the destination file
when the source extent is shared with it.

The VFS remapping prep functions  only support whole block dedupe, but
we still need to appear to support whole file dedupe correctly.  Hence
if the dedupe request includes the last block of the souce file, don't
include it in the actual dedupe operation. If the rest of the range
dedupes successfully, then reject the entire request.  A subsequent
patch will enable us to shorten dedupe requests correctly.

When reflinking sub-file ranges, a data corruption can occur when the
source file range includes a partial EOF block. This shares the unknown
data beyond EOF into the second file at a position inside EOF, exposing
stale data in the second file.

If the reflink request includes the last block of the souce file, only
proceed with the reflink operation if it lands at or past the
destination file's current EOF. If it lands within the destination file
EOF, reject the entire request with -EINVAL and make the caller go the
hard way.  A subsequent patch will enable us to shorten reflink requests
correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)


diff --git a/fs/read_write.c b/fs/read_write.c
index 2456da3f8a41..0f0a6efdd502 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1708,6 +1708,34 @@ static int clone_verify_area(struct file *file, loff_t pos, u64 len, bool write)
 
 	return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
 }
+/*
+ * Ensure that we don't remap a partial EOF block in the middle of something
+ * else.  Assume that the offsets have already been checked for block
+ * alignment.
+ *
+ * For deduplication we always scale down to the previous block because we
+ * can't meaningfully compare post-EOF contents.
+ *
+ * For clone we only link a partial EOF block above the destination file's EOF.
+ */
+static int generic_remap_check_len(struct inode *inode_in,
+				   struct inode *inode_out,
+				   loff_t pos_out,
+				   u64 *len,
+				   bool is_dedupe)
+{
+	u64 blkmask = i_blocksize(inode_in) - 1;
+
+	if ((*len & blkmask) == 0)
+		return 0;
+
+	if (is_dedupe)
+		*len &= ~blkmask;
+	else if (pos_out + *len < i_size_read(inode_out))
+		return -EINVAL;
+
+	return 0;
+}
 
 /*
  * Check that the two inodes are eligible for cloning, the ranges make
@@ -1787,6 +1815,11 @@ int vfs_clone_file_prep(struct file *file_in, loff_t pos_in,
 			return -EBADE;
 	}
 
+	ret = generic_remap_check_len(inode_in, inode_out, pos_out, len,
+			is_dedupe);
+	if (ret)
+		return ret;
+
 	return 1;
 }
 EXPORT_SYMBOL(vfs_clone_file_prep);

WARNING: multiple messages have this Message-ID
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: david@fromorbit.com, darrick.wong@oracle.com
Cc: sandeen@redhat.com, linux-nfs@vger.kernel.org,
	linux-cifs@vger.kernel.org, linux-unionfs@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Christoph Hellwig <hch@lst.de>,
	ocfs2-devel@oss.oracle.com
Subject: [PATCH 05/28] vfs: avoid problematic remapping requests into partial EOF block
Date: Sun, 21 Oct 2018 09:15:37 -0700	[thread overview]
Message-ID: <154013853780.29026.5441191187672186537.stgit@magnolia> (raw)
In-Reply-To: <154013850285.29026.16168387526580596209.stgit@magnolia>

From: Darrick J. Wong <darrick.wong@oracle.com>

A deduplication data corruption is exposed in XFS and btrfs. It is
caused by extending the block match range to include the partial EOF
block, but then allowing unknown data beyond EOF to be considered a
"match" to data in the destination file because the comparison is only
made to the end of the source file. This corrupts the destination file
when the source extent is shared with it.

The VFS remapping prep functions  only support whole block dedupe, but
we still need to appear to support whole file dedupe correctly.  Hence
if the dedupe request includes the last block of the souce file, don't
include it in the actual dedupe operation. If the rest of the range
dedupes successfully, then reject the entire request.  A subsequent
patch will enable us to shorten dedupe requests correctly.

When reflinking sub-file ranges, a data corruption can occur when the
source file range includes a partial EOF block. This shares the unknown
data beyond EOF into the second file at a position inside EOF, exposing
stale data in the second file.

If the reflink request includes the last block of the souce file, only
proceed with the reflink operation if it lands at or past the
destination file's current EOF. If it lands within the destination file
EOF, reject the entire request with -EINVAL and make the caller go the
hard way.  A subsequent patch will enable us to shorten reflink requests
correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)


diff --git a/fs/read_write.c b/fs/read_write.c
index 2456da3f8a41..0f0a6efdd502 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1708,6 +1708,34 @@ static int clone_verify_area(struct file *file, loff_t pos, u64 len, bool write)
 
 	return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
 }
+/*
+ * Ensure that we don't remap a partial EOF block in the middle of something
+ * else.  Assume that the offsets have already been checked for block
+ * alignment.
+ *
+ * For deduplication we always scale down to the previous block because we
+ * can't meaningfully compare post-EOF contents.
+ *
+ * For clone we only link a partial EOF block above the destination file's EOF.
+ */
+static int generic_remap_check_len(struct inode *inode_in,
+				   struct inode *inode_out,
+				   loff_t pos_out,
+				   u64 *len,
+				   bool is_dedupe)
+{
+	u64 blkmask = i_blocksize(inode_in) - 1;
+
+	if ((*len & blkmask) == 0)
+		return 0;
+
+	if (is_dedupe)
+		*len &= ~blkmask;
+	else if (pos_out + *len < i_size_read(inode_out))
+		return -EINVAL;
+
+	return 0;
+}
 
 /*
  * Check that the two inodes are eligible for cloning, the ranges make
@@ -1787,6 +1815,11 @@ int vfs_clone_file_prep(struct file *file_in, loff_t pos_in,
 			return -EBADE;
 	}
 
+	ret = generic_remap_check_len(inode_in, inode_out, pos_out, len,
+			is_dedupe);
+	if (ret)
+		return ret;
+
 	return 1;
 }
 EXPORT_SYMBOL(vfs_clone_file_prep);


WARNING: multiple messages have this Message-ID
From: Darrick J. Wong <darrick.wong@oracle.com>
To: david@fromorbit.com, darrick.wong@oracle.com
Cc: sandeen@redhat.com, linux-nfs@vger.kernel.org,
	linux-cifs@vger.kernel.org, linux-unionfs@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Christoph Hellwig <hch@lst.de>,
	ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH 05/28] vfs: avoid problematic remapping requests into partial EOF block
Date: Sun, 21 Oct 2018 09:15:37 -0700	[thread overview]
Message-ID: <154013853780.29026.5441191187672186537.stgit@magnolia> (raw)
In-Reply-To: <154013850285.29026.16168387526580596209.stgit@magnolia>

From: Darrick J. Wong <darrick.wong@oracle.com>

A deduplication data corruption is exposed in XFS and btrfs. It is
caused by extending the block match range to include the partial EOF
block, but then allowing unknown data beyond EOF to be considered a
"match" to data in the destination file because the comparison is only
made to the end of the source file. This corrupts the destination file
when the source extent is shared with it.

The VFS remapping prep functions  only support whole block dedupe, but
we still need to appear to support whole file dedupe correctly.  Hence
if the dedupe request includes the last block of the souce file, don't
include it in the actual dedupe operation. If the rest of the range
dedupes successfully, then reject the entire request.  A subsequent
patch will enable us to shorten dedupe requests correctly.

When reflinking sub-file ranges, a data corruption can occur when the
source file range includes a partial EOF block. This shares the unknown
data beyond EOF into the second file at a position inside EOF, exposing
stale data in the second file.

If the reflink request includes the last block of the souce file, only
proceed with the reflink operation if it lands at or past the
destination file's current EOF. If it lands within the destination file
EOF, reject the entire request with -EINVAL and make the caller go the
hard way.  A subsequent patch will enable us to shorten reflink requests
correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)


diff --git a/fs/read_write.c b/fs/read_write.c
index 2456da3f8a41..0f0a6efdd502 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1708,6 +1708,34 @@ static int clone_verify_area(struct file *file, loff_t pos, u64 len, bool write)
 
 	return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
 }
+/*
+ * Ensure that we don't remap a partial EOF block in the middle of something
+ * else.  Assume that the offsets have already been checked for block
+ * alignment.
+ *
+ * For deduplication we always scale down to the previous block because we
+ * can't meaningfully compare post-EOF contents.
+ *
+ * For clone we only link a partial EOF block above the destination file's EOF.
+ */
+static int generic_remap_check_len(struct inode *inode_in,
+				   struct inode *inode_out,
+				   loff_t pos_out,
+				   u64 *len,
+				   bool is_dedupe)
+{
+	u64 blkmask = i_blocksize(inode_in) - 1;
+
+	if ((*len & blkmask) == 0)
+		return 0;
+
+	if (is_dedupe)
+		*len &= ~blkmask;
+	else if (pos_out + *len < i_size_read(inode_out))
+		return -EINVAL;
+
+	return 0;
+}
 
 /*
  * Check that the two inodes are eligible for cloning, the ranges make
@@ -1787,6 +1815,11 @@ int vfs_clone_file_prep(struct file *file_in, loff_t pos_in,
 			return -EBADE;
 	}
 
+	ret = generic_remap_check_len(inode_in, inode_out, pos_out, len,
+			is_dedupe);
+	if (ret)
+		return ret;
+
 	return 1;
 }
 EXPORT_SYMBOL(vfs_clone_file_prep);

  parent reply	other threads:[~2018-10-21 16:15 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-21 16:15 [PATCH v6 00/28] fs: fixes for serious clone/dedupe problems Darrick J. Wong
2018-10-21 16:15 ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:15 ` [PATCH 01/28] vfs: vfs_clone_file_prep_inodes should return EINVAL for a clone from beyond EOF Darrick J. Wong
2018-10-21 16:15   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:15 ` [PATCH 02/28] vfs: check file ranges before cloning files Darrick J. Wong
2018-10-21 16:15   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:15 ` [PATCH 03/28] vfs: exit early from zero length remap operations Darrick J. Wong
2018-10-21 16:15   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:15 ` [PATCH 04/28] vfs: strengthen checking of file range inputs to generic_remap_checks Darrick J. Wong
2018-10-21 16:15   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:15 ` Darrick J. Wong [this message]
2018-10-21 16:15   ` [Ocfs2-devel] [PATCH 05/28] vfs: avoid problematic remapping requests into partial EOF block Darrick J. Wong
2018-10-21 16:15   ` Darrick J. Wong
2018-10-21 16:15 ` [PATCH 06/28] vfs: skip zero-length dedupe requests Darrick J. Wong
2018-10-21 16:15   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:15 ` [PATCH 07/28] vfs: rename vfs_clone_file_prep to be more descriptive Darrick J. Wong
2018-10-21 16:15   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:15 ` [PATCH 08/28] vfs: rename clone_verify_area to remap_verify_area Darrick J. Wong
2018-10-21 16:15   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:16 ` [PATCH 09/28] vfs: combine the clone and dedupe into a single remap_file_range Darrick J. Wong
2018-10-21 16:16   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:16 ` [PATCH 10/28] vfs: pass remap flags to generic_remap_file_range_prep Darrick J. Wong
2018-10-21 16:16   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:16 ` [PATCH 11/28] vfs: pass remap flags to generic_remap_checks Darrick J. Wong
2018-10-21 16:16   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:16   ` Darrick J. Wong
2018-10-21 16:16 ` [PATCH 12/28] vfs: remap helper should update destination inode metadata Darrick J. Wong
2018-10-21 16:16   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:16 ` [PATCH 13/28] vfs: make remap_file_range functions take and return bytes completed Darrick J. Wong
2018-10-21 16:16   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:16 ` [PATCH 14/28] vfs: plumb remap flags through the vfs clone functions Darrick J. Wong
2018-10-21 16:16   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:16 ` [PATCH 15/28] vfs: plumb remap flags through the vfs dedupe functions Darrick J. Wong
2018-10-21 16:16   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:16 ` [PATCH 16/28] vfs: enable remap callers that can handle short operations Darrick J. Wong
2018-10-21 16:16   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:17 ` [PATCH 17/28] vfs: hide file range comparison function Darrick J. Wong
2018-10-21 16:17   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:17   ` Darrick J. Wong
2018-10-21 16:17 ` [PATCH 18/28] vfs: clean up generic_remap_file_range_prep return value Darrick J. Wong
2018-10-21 16:17   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:17   ` Darrick J. Wong
2018-10-21 16:17 ` [PATCH 19/28] ocfs2: truncate page cache for clone destination file before remapping Darrick J. Wong
2018-10-21 16:17   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:17 ` [PATCH 20/28] ocfs2: fix pagecache truncation prior to reflink Darrick J. Wong
2018-10-21 16:17   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:17 ` [PATCH 21/28] ocfs2: support partial clone range and dedupe range Darrick J. Wong
2018-10-21 16:17   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:17 ` [PATCH 22/28] ocfs2: remove ocfs2_reflink_remap_range Darrick J. Wong
2018-10-21 16:17   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:17 ` [PATCH 23/28] xfs: fix pagecache truncation prior to reflink Darrick J. Wong
2018-10-21 16:17   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:17   ` Darrick J. Wong
2018-10-21 16:17 ` [PATCH 24/28] xfs: clean up xfs_reflink_remap_blocks call site Darrick J. Wong
2018-10-21 16:17   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-22  2:11   ` Dave Chinner
2018-10-22  2:11     ` [Ocfs2-devel] " Dave Chinner
2018-10-21 16:17 ` [PATCH 25/28] xfs: support returning partial reflink results Darrick J. Wong
2018-10-21 16:17   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:17   ` Darrick J. Wong
2018-10-22  2:14   ` Dave Chinner
2018-10-22  2:14     ` [Ocfs2-devel] " Dave Chinner
2018-10-21 16:18 ` [PATCH 26/28] xfs: remove redundant remap partial EOF block checks Darrick J. Wong
2018-10-21 16:18   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-21 16:18 ` [PATCH 27/28] xfs: remove xfs_reflink_remap_range Darrick J. Wong
2018-10-21 16:18   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-22  2:17   ` Dave Chinner
2018-10-22  2:17     ` [Ocfs2-devel] " Dave Chinner
2018-10-21 16:18 ` [PATCH 28/28] xfs: remove [cm]time update from reflink calls Darrick J. Wong
2018-10-21 16:18   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-22  2:18   ` Dave Chinner
2018-10-22  2:18     ` [Ocfs2-devel] " Dave Chinner
2018-10-22  2:21 ` [PATCH v6 00/28] fs: fixes for serious clone/dedupe problems Dave Chinner
2018-10-22  2:21   ` [Ocfs2-devel] " Dave Chinner
2018-10-22  4:37   ` Dave Chinner
2018-10-22  4:37     ` [Ocfs2-devel] " Dave Chinner
2018-10-22  4:52     ` Al Viro
2018-10-22  4:52       ` [Ocfs2-devel] " Al Viro
2018-10-22  5:08       ` Dave Chinner
2018-10-22  5:08         ` [Ocfs2-devel] " Dave Chinner
2018-10-22  5:42         ` Amir Goldstein
2018-10-22  6:55           ` Dave Chinner
2018-10-22  6:55             ` [Ocfs2-devel] " Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=154013853780.29026.5441191187672186537.stgit@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=sandeen@redhat.com \
    --subject='Re: [PATCH 05/28] vfs: avoid problematic remapping requests into partial EOF block' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.