All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] xfs: fix another couple of reflink data corruptions
@ 2018-10-05  1:23 Dave Chinner
  2018-10-05  1:23 ` [PATCH 1/2] xfs: fix data corruption w/ unaligned dedupe ranges Dave Chinner
  2018-10-05  1:23 ` [PATCH 2/2] xfs: fix data corruption w/ unaligned reflink ranges Dave Chinner
  0 siblings, 2 replies; 10+ messages in thread
From: Dave Chinner @ 2018-10-05  1:23 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

These are the fixes needed to avoid having dedupe and reflink expose
stale data due to sharing unitialised post EOF data from the source
file inside the destination file's EOF.

Cheers,

Dave.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] xfs: fix data corruption w/ unaligned dedupe ranges
  2018-10-05  1:23 [PATCH 0/2] xfs: fix another couple of reflink data corruptions Dave Chinner
@ 2018-10-05  1:23 ` Dave Chinner
  2018-10-05 23:58   ` Darrick J. Wong
  2018-10-06 10:47   ` Christoph Hellwig
  2018-10-05  1:23 ` [PATCH 2/2] xfs: fix data corruption w/ unaligned reflink ranges Dave Chinner
  1 sibling, 2 replies; 10+ messages in thread
From: Dave Chinner @ 2018-10-05  1:23 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

A deduplication data corruption is Exposed by fstests generic/505 on
XFS. It is caused by extending the block match range to include the
partial EOF block, but then allowing unknown data beyond EOF to be
considered a "match" to data in the destination file because the
comparison is only made to the end of the source file. This corrupts
the destination file when the source extent is shared with it.

XFS only supports whole block dedupe, but we still need to appear to
support whole file dedupe correctly.  Hence if the dedupe request
includes the last block of the souce file, don't include it in the
actual XFS dedupe operation. If the rest of the range dedupes
successfully, then report the partial last block as deduped, too, so
that userspace sees it as a successful dedupe rather than return
EINVAL because we can't dedupe unaligned blocks.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_reflink.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 5289e22cb081..6b0da1b80103 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1222,6 +1222,19 @@ xfs_iolock_two_inodes_and_break_layout(
 
 /*
  * Link a range of blocks from one file to another.
+ *
+ * The VFS allows partial EOF blocks to "match" for dedupe even though it hasn't
+ * checked that the bytes beyond EOF physically match. Hence we cannot use the
+ * EOF block in the source dedupe range because it's not a complete block match,
+ * hence can introduce a corruption into the file that has it's
+ * block replaced.
+ *
+ * Despite this issue, we still need to report that range as successfully
+ * deduped to avoid confusing userspace with EINVAL errors on completely
+ * matching file data. The only time that an unaligned length will be passed to
+ * us is when it spans the EOF block of the source file, so if we simply mask it
+ * down to be block aligned here the we will dedupe everything but that partial
+ * EOF block.
  */
 int
 xfs_reflink_remap_range(
@@ -1274,6 +1287,14 @@ xfs_reflink_remap_range(
 	if (ret <= 0)
 		goto out_unlock;
 
+	/*
+	 * If the dedupe data matches, chop off the partial EOF block
+	 * from the source file so we don't try to dedupe the partial
+	 * EOF block.
+	 */
+	if (is_dedupe)
+		len &= ~((u64)i_blocksize(inode_in) - 1);
+
 	/* Attach dquots to dest inode before changing block map */
 	ret = xfs_qm_dqattach(dest);
 	if (ret)
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] xfs: fix data corruption w/ unaligned reflink ranges
  2018-10-05  1:23 [PATCH 0/2] xfs: fix another couple of reflink data corruptions Dave Chinner
  2018-10-05  1:23 ` [PATCH 1/2] xfs: fix data corruption w/ unaligned dedupe ranges Dave Chinner
@ 2018-10-05  1:23 ` Dave Chinner
  2018-10-05  1:40   ` Darrick J. Wong
                     ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Dave Chinner @ 2018-10-05  1:23 UTC (permalink / raw)
  To: linux-xfs

When reflinking sub-file ranges, a data corruption can occur when
the source file range includes a partial EOF block. This shares the
unknown data beyond EOF into the second file at a position inside
EOF, exposing stale data in the second file.

XFS only supports whole block sharing, but we still need to
support whole file reflink correctly.  Hence if the reflink
request includes the last block of the souce file, only proceed with
the reflink operation if it lands at or past the destination file's
current EOF. If it lands within the destination file EOF, reject the
entire request with -EINVAL and make the caller go the hard way.

This avoids the data corruption vector, but also avoids disruption
of returning EINVAL to userspace for the common case of whole file
cloning.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_reflink.c | 39 +++++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 6b0da1b80103..2615271603ce 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1229,12 +1229,24 @@ xfs_iolock_two_inodes_and_break_layout(
  * hence can introduce a corruption into the file that has it's
  * block replaced.
  *
- * Despite this issue, we still need to report that range as successfully
- * deduped to avoid confusing userspace with EINVAL errors on completely
- * matching file data. The only time that an unaligned length will be passed to
- * us is when it spans the EOF block of the source file, so if we simply mask it
- * down to be block aligned here the we will dedupe everything but that partial
- * EOF block.
+ * In similar fashion, the VFS file cloning also allows partial EOF blocks to be
+ * "block aligned" for the purposes of cloning entire files. 
+ * However, if the source file range
+ * includes the EOF block and it lands within the existing EOF of the
+ * destination file, then we can expose stale data from beyond the source file
+ * EOF in the destination file.
+ *
+ * XFs doesn't support partial block sharing, so in both cases we have check
+ * these cases ourselves. For dedupe, we can simply round the length to dedupe
+ * down to the previous whole block and ignore the partial EOF block. While this
+ * means we can't dedupe the last block of a file, this is an acceptible
+ * tradeoff for simplicity on implementation.
+ *
+ * For cloning, we want to share the partial EOF block if it is also the new EOF
+ * block of the destination file. If the partial EOF blck lies inside the
+ * existing destination EOF, then we have to abort the clone to avoid exposing
+ * stale data int eh destination file. Hence we reject these clone attempts with
+ * -EINVAL in this case.
  */
 int
 xfs_reflink_remap_range(
@@ -1255,6 +1267,7 @@ xfs_reflink_remap_range(
 	xfs_filblks_t		fsblen;
 	xfs_extlen_t		cowextsize;
 	ssize_t			ret;
+	u64			blkmask = i_blocksize(inode_in) - 1;
 
 	if (!xfs_sb_version_hasreflink(&mp->m_sb))
 		return -EOPNOTSUPP;
@@ -1292,8 +1305,18 @@ xfs_reflink_remap_range(
 	 * from the source file so we don't try to dedupe the partial
 	 * EOF block.
 	 */
-	if (is_dedupe)
-		len &= ~((u64)i_blocksize(inode_in) - 1);
+	if (is_dedupe) {
+		len &= ~blkmask;
+	} else if (len & blkmask) {
+		/*
+		 * The user is attempting to share a partial EOF block,
+		 * if it's inside the destination EOF then reject it
+		 */
+		if (pos_out + len < i_size_read(inode_out)) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+	}
 
 	/* Attach dquots to dest inode before changing block map */
 	ret = xfs_qm_dqattach(dest);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] xfs: fix data corruption w/ unaligned reflink ranges
  2018-10-05  1:23 ` [PATCH 2/2] xfs: fix data corruption w/ unaligned reflink ranges Dave Chinner
@ 2018-10-05  1:40   ` Darrick J. Wong
  2018-10-05  5:21     ` Dave Chinner
  2018-10-06  0:00   ` Darrick J. Wong
  2018-10-06 10:54   ` Christoph Hellwig
  2 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2018-10-05  1:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Oct 05, 2018 at 11:23:36AM +1000, Dave Chinner wrote:
> When reflinking sub-file ranges, a data corruption can occur when
> the source file range includes a partial EOF block. This shares the
> unknown data beyond EOF into the second file at a position inside
> EOF, exposing stale data in the second file.
> 
> XFS only supports whole block sharing, but we still need to
> support whole file reflink correctly.  Hence if the reflink
> request includes the last block of the souce file, only proceed with
> the reflink operation if it lands at or past the destination file's
> current EOF. If it lands within the destination file EOF, reject the
> entire request with -EINVAL and make the caller go the hard way.
> 
> This avoids the data corruption vector, but also avoids disruption
> of returning EINVAL to userspace for the common case of whole file
> cloning.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_reflink.c | 39 +++++++++++++++++++++++++++++++--------
>  1 file changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 6b0da1b80103..2615271603ce 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1229,12 +1229,24 @@ xfs_iolock_two_inodes_and_break_layout(
>   * hence can introduce a corruption into the file that has it's
>   * block replaced.
>   *
> - * Despite this issue, we still need to report that range as successfully
> - * deduped to avoid confusing userspace with EINVAL errors on completely
> - * matching file data. The only time that an unaligned length will be passed to
> - * us is when it spans the EOF block of the source file, so if we simply mask it
> - * down to be block aligned here the we will dedupe everything but that partial
> - * EOF block.
> + * In similar fashion, the VFS file cloning also allows partial EOF blocks to be
> + * "block aligned" for the purposes of cloning entire files. 
> + * However, if the source file range
> + * includes the EOF block and it lands within the existing EOF of the
> + * destination file, then we can expose stale data from beyond the source file
> + * EOF in the destination file.
> + *
> + * XFs doesn't support partial block sharing, so in both cases we have check
> + * these cases ourselves. For dedupe, we can simply round the length to dedupe
> + * down to the previous whole block and ignore the partial EOF block. While this
> + * means we can't dedupe the last block of a file, this is an acceptible
> + * tradeoff for simplicity on implementation.
> + *
> + * For cloning, we want to share the partial EOF block if it is also the new EOF
> + * block of the destination file. If the partial EOF blck lies inside the
> + * existing destination EOF, then we have to abort the clone to avoid exposing
> + * stale data int eh destination file. Hence we reject these clone attempts with
> + * -EINVAL in this case.
>   */
>  int
>  xfs_reflink_remap_range(
> @@ -1255,6 +1267,7 @@ xfs_reflink_remap_range(
>  	xfs_filblks_t		fsblen;
>  	xfs_extlen_t		cowextsize;
>  	ssize_t			ret;
> +	u64			blkmask = i_blocksize(inode_in) - 1;
>  
>  	if (!xfs_sb_version_hasreflink(&mp->m_sb))
>  		return -EOPNOTSUPP;
> @@ -1292,8 +1305,18 @@ xfs_reflink_remap_range(
>  	 * from the source file so we don't try to dedupe the partial
>  	 * EOF block.
>  	 */
> -	if (is_dedupe)
> -		len &= ~((u64)i_blocksize(inode_in) - 1);
> +	if (is_dedupe) {
> +		len &= ~blkmask;
> +	} else if (len & blkmask) {
> +		/*
> +		 * The user is attempting to share a partial EOF block,
> +		 * if it's inside the destination EOF then reject it
> +		 */
> +		if (pos_out + len < i_size_read(inode_out)) {
> +			ret = -EINVAL;
> +			goto out_unlock;

Hmm... to integrate this with the new series I just posted, I think we'd
decrease len to be block aligned (perhaps in generic_clone_checks) so
that copy_file_range would be able to pagecache copy the last bit
instead of failing the whole operation.  IOWs,

if (is_dedupe) {
	len &= ~blkmask;
} else if (len & blkmask) {
	if (pos_out + len < size_out) {
		len &= ~blkmask;
	}
}

But at a first glance these two patches look ok to me.

--D

> +		}
> +	}
>  
>  	/* Attach dquots to dest inode before changing block map */
>  	ret = xfs_qm_dqattach(dest);
> -- 
> 2.17.0
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] xfs: fix data corruption w/ unaligned reflink ranges
  2018-10-05  1:40   ` Darrick J. Wong
@ 2018-10-05  5:21     ` Dave Chinner
  2018-10-05 17:11       ` Darrick J. Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2018-10-05  5:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Oct 04, 2018 at 06:40:14PM -0700, Darrick J. Wong wrote:
> On Fri, Oct 05, 2018 at 11:23:36AM +1000, Dave Chinner wrote:
> > When reflinking sub-file ranges, a data corruption can occur when
> > the source file range includes a partial EOF block. This shares the
> > unknown data beyond EOF into the second file at a position inside
> > EOF, exposing stale data in the second file.
> > 
> > XFS only supports whole block sharing, but we still need to
> > support whole file reflink correctly.  Hence if the reflink
> > request includes the last block of the souce file, only proceed with
> > the reflink operation if it lands at or past the destination file's
> > current EOF. If it lands within the destination file EOF, reject the
> > entire request with -EINVAL and make the caller go the hard way.
> > 
> > This avoids the data corruption vector, but also avoids disruption
> > of returning EINVAL to userspace for the common case of whole file
> > cloning.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_reflink.c | 39 +++++++++++++++++++++++++++++++--------
> >  1 file changed, 31 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index 6b0da1b80103..2615271603ce 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -1229,12 +1229,24 @@ xfs_iolock_two_inodes_and_break_layout(
> >   * hence can introduce a corruption into the file that has it's
> >   * block replaced.
> >   *
> > - * Despite this issue, we still need to report that range as successfully
> > - * deduped to avoid confusing userspace with EINVAL errors on completely
> > - * matching file data. The only time that an unaligned length will be passed to
> > - * us is when it spans the EOF block of the source file, so if we simply mask it
> > - * down to be block aligned here the we will dedupe everything but that partial
> > - * EOF block.
> > + * In similar fashion, the VFS file cloning also allows partial EOF blocks to be
> > + * "block aligned" for the purposes of cloning entire files. 
> > + * However, if the source file range
> > + * includes the EOF block and it lands within the existing EOF of the
> > + * destination file, then we can expose stale data from beyond the source file
> > + * EOF in the destination file.
> > + *
> > + * XFs doesn't support partial block sharing, so in both cases we have check
> > + * these cases ourselves. For dedupe, we can simply round the length to dedupe
> > + * down to the previous whole block and ignore the partial EOF block. While this
> > + * means we can't dedupe the last block of a file, this is an acceptible
> > + * tradeoff for simplicity on implementation.
> > + *
> > + * For cloning, we want to share the partial EOF block if it is also the new EOF
> > + * block of the destination file. If the partial EOF blck lies inside the
> > + * existing destination EOF, then we have to abort the clone to avoid exposing
> > + * stale data int eh destination file. Hence we reject these clone attempts with
> > + * -EINVAL in this case.
> >   */
> >  int
> >  xfs_reflink_remap_range(
> > @@ -1255,6 +1267,7 @@ xfs_reflink_remap_range(
> >  	xfs_filblks_t		fsblen;
> >  	xfs_extlen_t		cowextsize;
> >  	ssize_t			ret;
> > +	u64			blkmask = i_blocksize(inode_in) - 1;
> >  
> >  	if (!xfs_sb_version_hasreflink(&mp->m_sb))
> >  		return -EOPNOTSUPP;
> > @@ -1292,8 +1305,18 @@ xfs_reflink_remap_range(
> >  	 * from the source file so we don't try to dedupe the partial
> >  	 * EOF block.
> >  	 */
> > -	if (is_dedupe)
> > -		len &= ~((u64)i_blocksize(inode_in) - 1);
> > +	if (is_dedupe) {
> > +		len &= ~blkmask;
> > +	} else if (len & blkmask) {
> > +		/*
> > +		 * The user is attempting to share a partial EOF block,
> > +		 * if it's inside the destination EOF then reject it
> > +		 */
> > +		if (pos_out + len < i_size_read(inode_out)) {
> > +			ret = -EINVAL;
> > +			goto out_unlock;
> 
> Hmm... to integrate this with the new series I just posted, I think we'd
> decrease len to be block aligned (perhaps in generic_clone_checks) so
> that copy_file_range would be able to pagecache copy the last bit
> instead of failing the whole operation.  IOWs,
> 
> if (is_dedupe) {
> 	len &= ~blkmask;
> } else if (len & blkmask) {
> 	if (pos_out + len < size_out) {
> 		len &= ~blkmask;
> 	}
> }

OK. But if I'm going to push it with just the EOF zeroing and
ctime/suid fixes, then this doesn't change until the handling of
partial completion is added to XFS later in the patchset, right?

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] xfs: fix data corruption w/ unaligned reflink ranges
  2018-10-05  5:21     ` Dave Chinner
@ 2018-10-05 17:11       ` Darrick J. Wong
  0 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2018-10-05 17:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Oct 05, 2018 at 03:21:21PM +1000, Dave Chinner wrote:
> On Thu, Oct 04, 2018 at 06:40:14PM -0700, Darrick J. Wong wrote:
> > On Fri, Oct 05, 2018 at 11:23:36AM +1000, Dave Chinner wrote:
> > > When reflinking sub-file ranges, a data corruption can occur when
> > > the source file range includes a partial EOF block. This shares the
> > > unknown data beyond EOF into the second file at a position inside
> > > EOF, exposing stale data in the second file.
> > > 
> > > XFS only supports whole block sharing, but we still need to
> > > support whole file reflink correctly.  Hence if the reflink
> > > request includes the last block of the souce file, only proceed with
> > > the reflink operation if it lands at or past the destination file's
> > > current EOF. If it lands within the destination file EOF, reject the
> > > entire request with -EINVAL and make the caller go the hard way.
> > > 
> > > This avoids the data corruption vector, but also avoids disruption
> > > of returning EINVAL to userspace for the common case of whole file
> > > cloning.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/xfs_reflink.c | 39 +++++++++++++++++++++++++++++++--------
> > >  1 file changed, 31 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > > index 6b0da1b80103..2615271603ce 100644
> > > --- a/fs/xfs/xfs_reflink.c
> > > +++ b/fs/xfs/xfs_reflink.c
> > > @@ -1229,12 +1229,24 @@ xfs_iolock_two_inodes_and_break_layout(
> > >   * hence can introduce a corruption into the file that has it's
> > >   * block replaced.
> > >   *
> > > - * Despite this issue, we still need to report that range as successfully
> > > - * deduped to avoid confusing userspace with EINVAL errors on completely
> > > - * matching file data. The only time that an unaligned length will be passed to
> > > - * us is when it spans the EOF block of the source file, so if we simply mask it
> > > - * down to be block aligned here the we will dedupe everything but that partial
> > > - * EOF block.
> > > + * In similar fashion, the VFS file cloning also allows partial EOF blocks to be
> > > + * "block aligned" for the purposes of cloning entire files. 
> > > + * However, if the source file range
> > > + * includes the EOF block and it lands within the existing EOF of the
> > > + * destination file, then we can expose stale data from beyond the source file
> > > + * EOF in the destination file.
> > > + *
> > > + * XFs doesn't support partial block sharing, so in both cases we have check
> > > + * these cases ourselves. For dedupe, we can simply round the length to dedupe
> > > + * down to the previous whole block and ignore the partial EOF block. While this
> > > + * means we can't dedupe the last block of a file, this is an acceptible
> > > + * tradeoff for simplicity on implementation.
> > > + *
> > > + * For cloning, we want to share the partial EOF block if it is also the new EOF
> > > + * block of the destination file. If the partial EOF blck lies inside the
> > > + * existing destination EOF, then we have to abort the clone to avoid exposing
> > > + * stale data int eh destination file. Hence we reject these clone attempts with
> > > + * -EINVAL in this case.
> > >   */
> > >  int
> > >  xfs_reflink_remap_range(
> > > @@ -1255,6 +1267,7 @@ xfs_reflink_remap_range(
> > >  	xfs_filblks_t		fsblen;
> > >  	xfs_extlen_t		cowextsize;
> > >  	ssize_t			ret;
> > > +	u64			blkmask = i_blocksize(inode_in) - 1;
> > >  
> > >  	if (!xfs_sb_version_hasreflink(&mp->m_sb))
> > >  		return -EOPNOTSUPP;
> > > @@ -1292,8 +1305,18 @@ xfs_reflink_remap_range(
> > >  	 * from the source file so we don't try to dedupe the partial
> > >  	 * EOF block.
> > >  	 */
> > > -	if (is_dedupe)
> > > -		len &= ~((u64)i_blocksize(inode_in) - 1);
> > > +	if (is_dedupe) {
> > > +		len &= ~blkmask;
> > > +	} else if (len & blkmask) {
> > > +		/*
> > > +		 * The user is attempting to share a partial EOF block,
> > > +		 * if it's inside the destination EOF then reject it
> > > +		 */
> > > +		if (pos_out + len < i_size_read(inode_out)) {
> > > +			ret = -EINVAL;
> > > +			goto out_unlock;
> > 
> > Hmm... to integrate this with the new series I just posted, I think we'd
> > decrease len to be block aligned (perhaps in generic_clone_checks) so
> > that copy_file_range would be able to pagecache copy the last bit
> > instead of failing the whole operation.  IOWs,
> > 
> > if (is_dedupe) {
> > 	len &= ~blkmask;
> > } else if (len & blkmask) {
> > 	if (pos_out + len < size_out) {
> > 		len &= ~blkmask;
> > 	}
> > }
> 
> OK. But if I'm going to push it with just the EOF zeroing and
> ctime/suid fixes, then this doesn't change until the handling of
> partial completion is added to XFS later in the patchset, right?

Right.  If you add this series before the partial completion patches
I'll fix things up when I rebase that part of my series.

--D

> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] xfs: fix data corruption w/ unaligned dedupe ranges
  2018-10-05  1:23 ` [PATCH 1/2] xfs: fix data corruption w/ unaligned dedupe ranges Dave Chinner
@ 2018-10-05 23:58   ` Darrick J. Wong
  2018-10-06 10:47   ` Christoph Hellwig
  1 sibling, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2018-10-05 23:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Oct 05, 2018 at 11:23:35AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> A deduplication data corruption is Exposed by fstests generic/505 on
> XFS. It is caused by extending the block match range to include the
> partial EOF block, but then allowing unknown data beyond EOF to be
> considered a "match" to data in the destination file because the
> comparison is only made to the end of the source file. This corrupts
> the destination file when the source extent is shared with it.
> 
> XFS only supports whole block dedupe, but we still need to appear to
> support whole file dedupe correctly.  Hence if the dedupe request
> includes the last block of the souce file, don't include it in the
> actual XFS dedupe operation. If the rest of the range dedupes
> successfully, then report the partial last block as deduped, too, so
> that userspace sees it as a successful dedupe rather than return
> EINVAL because we can't dedupe unaligned blocks.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_reflink.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 5289e22cb081..6b0da1b80103 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1222,6 +1222,19 @@ xfs_iolock_two_inodes_and_break_layout(
>  
>  /*
>   * Link a range of blocks from one file to another.
> + *
> + * The VFS allows partial EOF blocks to "match" for dedupe even though it hasn't
> + * checked that the bytes beyond EOF physically match. Hence we cannot use the
> + * EOF block in the source dedupe range because it's not a complete block match,
> + * hence can introduce a corruption into the file that has it's
> + * block replaced.
> + *
> + * Despite this issue, we still need to report that range as successfully
> + * deduped to avoid confusing userspace with EINVAL errors on completely
> + * matching file data. The only time that an unaligned length will be passed to
> + * us is when it spans the EOF block of the source file, so if we simply mask it
> + * down to be block aligned here the we will dedupe everything but that partial
> + * EOF block.
>   */
>  int
>  xfs_reflink_remap_range(
> @@ -1274,6 +1287,14 @@ xfs_reflink_remap_range(
>  	if (ret <= 0)
>  		goto out_unlock;
>  
> +	/*
> +	 * If the dedupe data matches, chop off the partial EOF block
> +	 * from the source file so we don't try to dedupe the partial
> +	 * EOF block.
> +	 */
> +	if (is_dedupe)
> +		len &= ~((u64)i_blocksize(inode_in) - 1);

I think that truncating the length like this is going to cause a mess
since we don't have the plumbing to report the shorter dedupe length to
userspace.  Granted, this also causes stale data exposure and I don't
want to hold this up for my big long clonerange cleanup to land.

I'll probably end up cleaning up all this into a generic "check these
clone args for block alignment" later anyway, so you might as well go
ahead:

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +
>  	/* Attach dquots to dest inode before changing block map */
>  	ret = xfs_qm_dqattach(dest);
>  	if (ret)
> -- 
> 2.17.0
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] xfs: fix data corruption w/ unaligned reflink ranges
  2018-10-05  1:23 ` [PATCH 2/2] xfs: fix data corruption w/ unaligned reflink ranges Dave Chinner
  2018-10-05  1:40   ` Darrick J. Wong
@ 2018-10-06  0:00   ` Darrick J. Wong
  2018-10-06 10:54   ` Christoph Hellwig
  2 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2018-10-06  0:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Oct 05, 2018 at 11:23:36AM +1000, Dave Chinner wrote:
> When reflinking sub-file ranges, a data corruption can occur when
> the source file range includes a partial EOF block. This shares the
> unknown data beyond EOF into the second file at a position inside
> EOF, exposing stale data in the second file.
> 
> XFS only supports whole block sharing, but we still need to
> support whole file reflink correctly.  Hence if the reflink
> request includes the last block of the souce file, only proceed with
> the reflink operation if it lands at or past the destination file's
> current EOF. If it lands within the destination file EOF, reject the
> entire request with -EINVAL and make the caller go the hard way.
> 
> This avoids the data corruption vector, but also avoids disruption
> of returning EINVAL to userspace for the common case of whole file
> cloning.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_reflink.c | 39 +++++++++++++++++++++++++++++++--------
>  1 file changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 6b0da1b80103..2615271603ce 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1229,12 +1229,24 @@ xfs_iolock_two_inodes_and_break_layout(
>   * hence can introduce a corruption into the file that has it's
>   * block replaced.
>   *
> - * Despite this issue, we still need to report that range as successfully
> - * deduped to avoid confusing userspace with EINVAL errors on completely
> - * matching file data. The only time that an unaligned length will be passed to
> - * us is when it spans the EOF block of the source file, so if we simply mask it
> - * down to be block aligned here the we will dedupe everything but that partial
> - * EOF block.
> + * In similar fashion, the VFS file cloning also allows partial EOF blocks to be
> + * "block aligned" for the purposes of cloning entire files. 
> + * However, if the source file range
> + * includes the EOF block and it lands within the existing EOF of the
> + * destination file, then we can expose stale data from beyond the source file
> + * EOF in the destination file.
> + *
> + * XFs doesn't support partial block sharing, so in both cases we have check
> + * these cases ourselves. For dedupe, we can simply round the length to dedupe
> + * down to the previous whole block and ignore the partial EOF block. While this
> + * means we can't dedupe the last block of a file, this is an acceptible
> + * tradeoff for simplicity on implementation.
> + *
> + * For cloning, we want to share the partial EOF block if it is also the new EOF
> + * block of the destination file. If the partial EOF blck lies inside the
> + * existing destination EOF, then we have to abort the clone to avoid exposing
> + * stale data int eh destination file. Hence we reject these clone attempts with
> + * -EINVAL in this case.
>   */
>  int
>  xfs_reflink_remap_range(
> @@ -1255,6 +1267,7 @@ xfs_reflink_remap_range(
>  	xfs_filblks_t		fsblen;
>  	xfs_extlen_t		cowextsize;
>  	ssize_t			ret;
> +	u64			blkmask = i_blocksize(inode_in) - 1;
>  
>  	if (!xfs_sb_version_hasreflink(&mp->m_sb))
>  		return -EOPNOTSUPP;
> @@ -1292,8 +1305,18 @@ xfs_reflink_remap_range(
>  	 * from the source file so we don't try to dedupe the partial
>  	 * EOF block.
>  	 */
> -	if (is_dedupe)
> -		len &= ~((u64)i_blocksize(inode_in) - 1);
> +	if (is_dedupe) {
> +		len &= ~blkmask;
> +	} else if (len & blkmask) {
> +		/*
> +		 * The user is attempting to share a partial EOF block,
> +		 * if it's inside the destination EOF then reject it
> +		 */
> +		if (pos_out + len < i_size_read(inode_out)) {
> +			ret = -EINVAL;
> +			goto out_unlock;

Same comment about reporting truncated lengths back to userspace as the
last patch, but nevertheless:

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +		}
> +	}
>  
>  	/* Attach dquots to dest inode before changing block map */
>  	ret = xfs_qm_dqattach(dest);
> -- 
> 2.17.0
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] xfs: fix data corruption w/ unaligned dedupe ranges
  2018-10-05  1:23 ` [PATCH 1/2] xfs: fix data corruption w/ unaligned dedupe ranges Dave Chinner
  2018-10-05 23:58   ` Darrick J. Wong
@ 2018-10-06 10:47   ` Christoph Hellwig
  1 sibling, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2018-10-06 10:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Looks fine (modulo the concern from Darrick):

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] xfs: fix data corruption w/ unaligned reflink ranges
  2018-10-05  1:23 ` [PATCH 2/2] xfs: fix data corruption w/ unaligned reflink ranges Dave Chinner
  2018-10-05  1:40   ` Darrick J. Wong
  2018-10-06  0:00   ` Darrick J. Wong
@ 2018-10-06 10:54   ` Christoph Hellwig
  2 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2018-10-06 10:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-10-06 17:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-05  1:23 [PATCH 0/2] xfs: fix another couple of reflink data corruptions Dave Chinner
2018-10-05  1:23 ` [PATCH 1/2] xfs: fix data corruption w/ unaligned dedupe ranges Dave Chinner
2018-10-05 23:58   ` Darrick J. Wong
2018-10-06 10:47   ` Christoph Hellwig
2018-10-05  1:23 ` [PATCH 2/2] xfs: fix data corruption w/ unaligned reflink ranges Dave Chinner
2018-10-05  1:40   ` Darrick J. Wong
2018-10-05  5:21     ` Dave Chinner
2018-10-05 17:11       ` Darrick J. Wong
2018-10-06  0:00   ` Darrick J. Wong
2018-10-06 10:54   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.