All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-btrfs@vger.kernel.org, ocfs2-devel@oss.oracle.com,
	sandeen@redhat.com
Subject: Re: [PATCH 00/15] fs: fixes for serious clone/dedupe problems
Date: Fri, 5 Oct 2018 11:17:18 +1000	[thread overview]
Message-ID: <20181005011718.GX31060@dastard> (raw)
In-Reply-To: <153870027422.29072.7433543674436957232.stgit@magnolia>

On Thu, Oct 04, 2018 at 05:44:34PM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> Dave, Eric, and I have been chasing a stale data exposure bug in the XFS
> reflink implementation, and tracked it down to reflink forgetting to do
> some of the file-extending activities that must happen for regular
> writes.
> 
> We then started auditing the clone, dedupe, and copyfile code and
> realized that from a file contents perspective, clonerange isn't any
> different from a regular file write.  Unfortunately, we also noticed
> that *unlike* a regular write, clonerange skips a ton of overflow
> checks, such as validating the ranges against s_maxbytes, MAX_NON_LFS,
> and RLIMIT_FSIZE.  We also observed that cloning into a file did not
> strip security privileges (suid, capabilities) like a regular write
> would.  I also noticed that xfs and ocfs2 need to dump the page cache
> before remapping blocks, not after.
> 
> In fixing the range checking problems I also realized that both dedupe
> and copyfile tell userspace how much of the requested operation was
> acted upon.  Since the range validation can shorten a clone request (or
> we can ENOSPC midway through), we might as well plumb the short
> operation reporting back through the VFS indirection code to userspace.
> 
> So, here's the whole giant pile of patches[1] that fix all the problems.
> The patch "generic: test reflink side effects" recently sent to fstests
> exercises the fixes in this series.  Tests are in [2].

Hmmm. I've got a couple of patches to fix dedupe/reflink partial EOF
block data corruptions, too. I'll have to see how they fit into this
new series - combined they add this code just after the call to
vfs_clone_file_prep_inodes():

....
+       u64                     blkmask = i_blocksize(inode_in) - 1;
....
+       /*
+        * If the dedupe data matches, chop off the partial EOF block
+        * from the source file so we don't try to dedupe the partial
+        * EOF block.
+        */
+       if (is_dedupe) {
+               len &= ~blkmask;
+       } else if (len & blkmask) {
+               /*
+                * The user is attempting to share a partial EOF block,
+                * if it's inside the destination EOF then reject it
+                */
+               if (pos_out + len < i_size_read(inode_out)) {
+                       ret = -EINVAL;
+                       goto out_unlock;
+               }
+       }

It might be better to put these in with the eof-zeroing patch then
add all the other changes on top? Let me post them separately,
as they may be candidates for 4.19-rc7 along with the eof zeroing.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-btrfs@vger.kernel.org, ocfs2-devel@oss.oracle.com,
	sandeen@redhat.com
Subject: [Ocfs2-devel] [PATCH 00/15] fs: fixes for serious clone/dedupe problems
Date: Fri, 5 Oct 2018 11:17:18 +1000	[thread overview]
Message-ID: <20181005011718.GX31060@dastard> (raw)
In-Reply-To: <153870027422.29072.7433543674436957232.stgit@magnolia>

On Thu, Oct 04, 2018 at 05:44:34PM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> Dave, Eric, and I have been chasing a stale data exposure bug in the XFS
> reflink implementation, and tracked it down to reflink forgetting to do
> some of the file-extending activities that must happen for regular
> writes.
> 
> We then started auditing the clone, dedupe, and copyfile code and
> realized that from a file contents perspective, clonerange isn't any
> different from a regular file write.  Unfortunately, we also noticed
> that *unlike* a regular write, clonerange skips a ton of overflow
> checks, such as validating the ranges against s_maxbytes, MAX_NON_LFS,
> and RLIMIT_FSIZE.  We also observed that cloning into a file did not
> strip security privileges (suid, capabilities) like a regular write
> would.  I also noticed that xfs and ocfs2 need to dump the page cache
> before remapping blocks, not after.
> 
> In fixing the range checking problems I also realized that both dedupe
> and copyfile tell userspace how much of the requested operation was
> acted upon.  Since the range validation can shorten a clone request (or
> we can ENOSPC midway through), we might as well plumb the short
> operation reporting back through the VFS indirection code to userspace.
> 
> So, here's the whole giant pile of patches[1] that fix all the problems.
> The patch "generic: test reflink side effects" recently sent to fstests
> exercises the fixes in this series.  Tests are in [2].

Hmmm. I've got a couple of patches to fix dedupe/reflink partial EOF
block data corruptions, too. I'll have to see how they fit into this
new series - combined they add this code just after the call to
vfs_clone_file_prep_inodes():

....
+       u64                     blkmask = i_blocksize(inode_in) - 1;
....
+       /*
+        * If the dedupe data matches, chop off the partial EOF block
+        * from the source file so we don't try to dedupe the partial
+        * EOF block.
+        */
+       if (is_dedupe) {
+               len &= ~blkmask;
+       } else if (len & blkmask) {
+               /*
+                * The user is attempting to share a partial EOF block,
+                * if it's inside the destination EOF then reject it
+                */
+               if (pos_out + len < i_size_read(inode_out)) {
+                       ret = -EINVAL;
+                       goto out_unlock;
+               }
+       }

It might be better to put these in with the eof-zeroing patch then
add all the other changes on top? Let me post them separately,
as they may be candidates for 4.19-rc7 along with the eof zeroing.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2018-10-05  1:17 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-05  0:44 [PATCH 00/15] fs: fixes for serious clone/dedupe problems Darrick J. Wong
2018-10-05  0:44 ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  0:44 ` [PATCH 01/15] xfs: add a per-xfs trace_printk macro Darrick J. Wong
2018-10-05  0:44   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  0:44 ` [PATCH 02/15] xfs: refactor clonerange preparation into a separate helper Darrick J. Wong
2018-10-05  0:44   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  5:28   ` Dave Chinner
2018-10-05  5:28     ` [Ocfs2-devel] " Dave Chinner
2018-10-05 17:06     ` Darrick J. Wong
2018-10-05 17:06       ` [Ocfs2-devel] " Darrick J. Wong
2018-10-06 10:30     ` Christoph Hellwig
2018-10-06 10:30       ` [Ocfs2-devel] " Christoph Hellwig
2018-10-05  7:02   ` Dave Chinner
2018-10-05  7:02     ` [Ocfs2-devel] " Dave Chinner
2018-10-05  9:02     ` Dave Chinner
2018-10-05  9:02       ` [Ocfs2-devel] " Dave Chinner
2018-10-05 17:21       ` Darrick J. Wong
2018-10-05 17:21         ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05 23:42         ` Dave Chinner
2018-10-05 23:42           ` [Ocfs2-devel] " Dave Chinner
2018-10-05  0:44 ` [PATCH 03/15] xfs: zero posteof blocks when cloning above eof Darrick J. Wong
2018-10-05  0:44   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  5:28   ` Dave Chinner
2018-10-05  5:28     ` [Ocfs2-devel] " Dave Chinner
2018-10-06 10:34   ` Christoph Hellwig
2018-10-06 10:34     ` [Ocfs2-devel] " Christoph Hellwig
2018-10-05  0:45 ` [PATCH 04/15] xfs: update ctime and remove suid before cloning files Darrick J. Wong
2018-10-05  0:45   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  5:30   ` Dave Chinner
2018-10-05  5:30     ` [Ocfs2-devel] " Dave Chinner
2018-10-06 10:35   ` Christoph Hellwig
2018-10-06 10:35     ` [Ocfs2-devel] " Christoph Hellwig
2018-10-05  0:45 ` [PATCH 05/15] vfs: check file ranges " Darrick J. Wong
2018-10-05  0:45   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-06 10:38   ` Christoph Hellwig
2018-10-06 10:38     ` [Ocfs2-devel] " Christoph Hellwig
2018-10-05  0:45 ` [PATCH 06/15] vfs: strengthen checking of file range inputs to clone/dedupe range Darrick J. Wong
2018-10-05  0:45   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  6:10   ` Amir Goldstein
2018-10-05 17:36     ` Darrick J. Wong
2018-10-05 17:36       ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  0:45 ` [PATCH 07/15] vfs: skip zero-length dedupe requests Darrick J. Wong
2018-10-05  0:45   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  8:39   ` Amir Goldstein
2018-10-06 10:39   ` Christoph Hellwig
2018-10-06 10:39     ` [Ocfs2-devel] " Christoph Hellwig
2018-10-05  0:45 ` [PATCH 08/15] vfs: change clone and dedupe range function pointers to return bytes completed Darrick J. Wong
2018-10-05  0:45   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  8:06   ` Amir Goldstein
2018-10-05 21:47     ` Darrick J. Wong
2018-10-05 21:47       ` [Ocfs2-devel] " Darrick J. Wong
2018-10-06 10:41   ` Christoph Hellwig
2018-10-06 10:41     ` [Ocfs2-devel] " Christoph Hellwig
2018-10-08 18:59     ` Darrick J. Wong
2018-10-08 18:59       ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  0:45 ` [PATCH 09/15] vfs: pass operation flags to {clone, dedupe}_file_range implementations Darrick J. Wong
2018-10-05  0:45   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  7:07   ` Amir Goldstein
2018-10-05 17:50     ` Darrick J. Wong
2018-10-05 17:50       ` [Ocfs2-devel] " Darrick J. Wong
2018-10-06 10:44       ` Christoph Hellwig
2018-10-06 10:44         ` [Ocfs2-devel] " Christoph Hellwig
2018-10-05  0:45 ` [PATCH 10/15] vfs: make cloning to source file eof more explicit Darrick J. Wong
2018-10-05  0:45   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  6:47   ` Amir Goldstein
2018-10-05  0:45 ` [PATCH 11/15] vfs: allow short clone and dedupe operations Darrick J. Wong
2018-10-05  0:45   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  0:46 ` [PATCH 12/15] vfs: implement opportunistic short dedupe Darrick J. Wong
2018-10-05  0:46   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  6:40   ` Amir Goldstein
2018-10-05 17:42     ` Darrick J. Wong
2018-10-05 17:42       ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  0:46 ` [PATCH 13/15] ocfs2: truncate page cache for clone destination file before remapping Darrick J. Wong
2018-10-05  0:46   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  0:46 ` [PATCH 14/15] ocfs2: support partial clone range and dedupe range Darrick J. Wong
2018-10-05  0:46   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  0:46 ` [PATCH 15/15] xfs: support returning partial reflink results Darrick J. Wong
2018-10-05  0:46   ` [Ocfs2-devel] " Darrick J. Wong
2018-10-05  1:17 ` Dave Chinner [this message]
2018-10-05  1:17   ` [Ocfs2-devel] [PATCH 00/15] fs: fixes for serious clone/dedupe problems Dave Chinner
2018-10-05  1:24   ` Darrick J. Wong
2018-10-05  1:24     ` [Ocfs2-devel] " Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181005011718.GX31060@dastard \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.