From: "Darrick J. Wong" <darrick.wong@oracle.com> To: Amir Goldstein <amir73il@gmail.com> Cc: Dave Chinner <david@fromorbit.com>, Eric Sandeen <sandeen@redhat.com>, Linux NFS Mailing List <linux-nfs@vger.kernel.org>, linux-cifs@vger.kernel.org, overlayfs <linux-unionfs@vger.kernel.org>, linux-xfs <linux-xfs@vger.kernel.org>, Linux MM <linux-mm@kvack.org>, Linux Btrfs <linux-btrfs@vger.kernel.org>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, ocfs2-devel@oss.oracle.com Subject: Re: [PATCH v3 00/25] fs: fixes for serious clone/dedupe problems Date: Thu, 11 Oct 2018 08:55:04 -0700 [thread overview] Message-ID: <20181011155504.GZ28243@magnolia> (raw) In-Reply-To: <CAOQ4uxgOvOOnKL5TsC9jpjBsepAgtQ56Hhjh7WDeXM7m0=dz7g@mail.gmail.com> On Thu, Oct 11, 2018 at 11:33:57AM +0300, Amir Goldstein wrote: > On Thu, Oct 11, 2018 at 7:12 AM Darrick J. Wong <darrick.wong@oracle.com> wrote: > > > > Hi all, > > > > Dave, Eric, and I have been chasing a stale data exposure bug in the XFS > > reflink implementation, and tracked it down to reflink forgetting to do > > some of the file-extending activities that must happen for regular > > writes. > > > > We then started auditing the clone, dedupe, and copyfile code and > > realized that from a file contents perspective, clonerange isn't any > > different from a regular file write. Unfortunately, we also noticed > > that *unlike* a regular write, clonerange skips a ton of overflow > > checks, such as validating the ranges against s_maxbytes, MAX_NON_LFS, > > and RLIMIT_FSIZE. We also observed that cloning into a file did not > > strip security privileges (suid, capabilities) like a regular write > > would. I also noticed that xfs and ocfs2 need to dump the page cache > > before remapping blocks, not after. > > > > In fixing the range checking problems I also realized that both dedupe > > and copyfile tell userspace how much of the requested operation was > > acted upon. Since the range validation can shorten a clone request (or > > we can ENOSPC midway through), we might as well plumb the short > > operation reporting back through the VFS indirection code to userspace. > > > > So, here's the whole giant pile of patches[1] that fix all the problems. > > This branch is against 4.19-rc7 with Dave Chinner's XFS for-next branch. > > The patch "generic: test reflink side effects" recently sent to fstests > > exercises the fixes in this series. Tests are in [2]. > > > > --D > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=djwong-devel > > I tested your branch with overlayfs over xfs. > I did not observe any failures with -g clone except for test generic/937 > which also failed on xfs in my test. Ok, matches what I saw overnight. Good, that means I (at least theoretically) know how to test overlayfs now. :) > I though that you forgot to mention I needed to grab xfsprogs from djwong-devel > for commit e84a9e93 ("xfs_io: dedupe command should only complain > if we don't dedupe anything"), but even with this change the test still fails: > > generic/937 - output mismatch (see > /old/home/amir/src/fstests/xfstests-dev/results//generic/937.out.bad) > --- tests/generic/937.out 2018-10-11 08:23:00.630938364 +0300 > +++ /old/home/amir/src/fstests/xfstests-dev/results//generic/937.out.bad > 2018-10-11 10:54:40.448134832 +0300 > @@ -4,8 +4,7 @@ > 39578c21e2cb9f6049b1cf7fc7be12a6 TEST_DIR/test-937/file2 > Files 1-2 do not match (intentional) > (partial) dedupe the middle blocks together > -deduped XXXX/XXXX bytes at offset XXXX > -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +XFS_IOC_FILE_EXTENT_SAME: Extents did not match. Ohhh, right, g/937 is the test to see if the dedupe implementation will return a short bytes_deduped if a single byte at the end of the range doesn't match. I'll have to update that because... I reverted the FIDEDUPERANGE behavior to set ->info[x].bytes_deduped = ->src_length even if we rounded the length down to the nearest block boundary to avoid incorrect sharing of blocks on files with non-block-aligned EOF. It turned out that the existing FIDEDUPERANGE users will hang in infinite loops if the kernel returns ->info[x].status == FILE_DEDUPE_RANGE_SAME but ->info[x].bytes_deduped < ->src_length. It seems really stupid to me that the kernel now lies to userspace to avoid breaking it, but that's what btrfs does so we're stuck with that. For now. > Compare sections > > One thing that *is* different with overlayfs test is that filefrag crashes > on this same test: > > QA output created by 937 > Create the original files > 35ac8d7917305c385c30f3d82c30a8f6 TEST_DIR/test-937/file1 > 39578c21e2cb9f6049b1cf7fc7be12a6 TEST_DIR/test-937/file2 > Files 1-2 do not match (intentional) > (partial) dedupe the middle blocks together > XFS_IOC_FILE_EXTENT_SAME: Extents did not match. > ./tests/generic/937: line 59: 19242 Floating point exception(core > dumped) ${FILEFRAG_PROG} -v $testdir/file1 >> $seqres.full > ./tests/generic/937: line 60: 19244 Floating point exception(core > dumped) ${FILEFRAG_PROG} -v $testdir/file2 >> $seqres.full > > It looks like an overlayfs v4.19-rc1 regression - FIGETBSZ returns zero. > I never noticed this regression before, because none of the generic tests > are using filefrag. Funny, I was wondering just the other day if there were any filesystems that set s_blocksize == 0... :) --D > Thanks, > Amir.
WARNING: multiple messages have this Message-ID (diff)
From: Darrick J. Wong <darrick.wong@oracle.com> To: Amir Goldstein <amir73il@gmail.com> Cc: Dave Chinner <david@fromorbit.com>, Eric Sandeen <sandeen@redhat.com>, Linux NFS Mailing List <linux-nfs@vger.kernel.org>, linux-cifs@vger.kernel.org, overlayfs <linux-unionfs@vger.kernel.org>, linux-xfs <linux-xfs@vger.kernel.org>, Linux MM <linux-mm@kvack.org>, Linux Btrfs <linux-btrfs@vger.kernel.org>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, ocfs2-devel@oss.oracle.com Subject: [Ocfs2-devel] [PATCH v3 00/25] fs: fixes for serious clone/dedupe problems Date: Thu, 11 Oct 2018 08:55:04 -0700 [thread overview] Message-ID: <20181011155504.GZ28243@magnolia> (raw) In-Reply-To: <CAOQ4uxgOvOOnKL5TsC9jpjBsepAgtQ56Hhjh7WDeXM7m0=dz7g@mail.gmail.com> On Thu, Oct 11, 2018 at 11:33:57AM +0300, Amir Goldstein wrote: > On Thu, Oct 11, 2018 at 7:12 AM Darrick J. Wong <darrick.wong@oracle.com> wrote: > > > > Hi all, > > > > Dave, Eric, and I have been chasing a stale data exposure bug in the XFS > > reflink implementation, and tracked it down to reflink forgetting to do > > some of the file-extending activities that must happen for regular > > writes. > > > > We then started auditing the clone, dedupe, and copyfile code and > > realized that from a file contents perspective, clonerange isn't any > > different from a regular file write. Unfortunately, we also noticed > > that *unlike* a regular write, clonerange skips a ton of overflow > > checks, such as validating the ranges against s_maxbytes, MAX_NON_LFS, > > and RLIMIT_FSIZE. We also observed that cloning into a file did not > > strip security privileges (suid, capabilities) like a regular write > > would. I also noticed that xfs and ocfs2 need to dump the page cache > > before remapping blocks, not after. > > > > In fixing the range checking problems I also realized that both dedupe > > and copyfile tell userspace how much of the requested operation was > > acted upon. Since the range validation can shorten a clone request (or > > we can ENOSPC midway through), we might as well plumb the short > > operation reporting back through the VFS indirection code to userspace. > > > > So, here's the whole giant pile of patches[1] that fix all the problems. > > This branch is against 4.19-rc7 with Dave Chinner's XFS for-next branch. > > The patch "generic: test reflink side effects" recently sent to fstests > > exercises the fixes in this series. Tests are in [2]. > > > > --D > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=djwong-devel > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=djwong-devel > > I tested your branch with overlayfs over xfs. > I did not observe any failures with -g clone except for test generic/937 > which also failed on xfs in my test. Ok, matches what I saw overnight. Good, that means I (at least theoretically) know how to test overlayfs now. :) > I though that you forgot to mention I needed to grab xfsprogs from djwong-devel > for commit e84a9e93 ("xfs_io: dedupe command should only complain > if we don't dedupe anything"), but even with this change the test still fails: > > generic/937 - output mismatch (see > /old/home/amir/src/fstests/xfstests-dev/results//generic/937.out.bad) > --- tests/generic/937.out 2018-10-11 08:23:00.630938364 +0300 > +++ /old/home/amir/src/fstests/xfstests-dev/results//generic/937.out.bad > 2018-10-11 10:54:40.448134832 +0300 > @@ -4,8 +4,7 @@ > 39578c21e2cb9f6049b1cf7fc7be12a6 TEST_DIR/test-937/file2 > Files 1-2 do not match (intentional) > (partial) dedupe the middle blocks together > -deduped XXXX/XXXX bytes at offset XXXX > -XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +XFS_IOC_FILE_EXTENT_SAME: Extents did not match. Ohhh, right, g/937 is the test to see if the dedupe implementation will return a short bytes_deduped if a single byte at the end of the range doesn't match. I'll have to update that because... I reverted the FIDEDUPERANGE behavior to set ->info[x].bytes_deduped = ->src_length even if we rounded the length down to the nearest block boundary to avoid incorrect sharing of blocks on files with non-block-aligned EOF. It turned out that the existing FIDEDUPERANGE users will hang in infinite loops if the kernel returns ->info[x].status == FILE_DEDUPE_RANGE_SAME but ->info[x].bytes_deduped < ->src_length. It seems really stupid to me that the kernel now lies to userspace to avoid breaking it, but that's what btrfs does so we're stuck with that. For now. > Compare sections > > One thing that *is* different with overlayfs test is that filefrag crashes > on this same test: > > QA output created by 937 > Create the original files > 35ac8d7917305c385c30f3d82c30a8f6 TEST_DIR/test-937/file1 > 39578c21e2cb9f6049b1cf7fc7be12a6 TEST_DIR/test-937/file2 > Files 1-2 do not match (intentional) > (partial) dedupe the middle blocks together > XFS_IOC_FILE_EXTENT_SAME: Extents did not match. > ./tests/generic/937: line 59: 19242 Floating point exception(core > dumped) ${FILEFRAG_PROG} -v $testdir/file1 >> $seqres.full > ./tests/generic/937: line 60: 19244 Floating point exception(core > dumped) ${FILEFRAG_PROG} -v $testdir/file2 >> $seqres.full > > It looks like an overlayfs v4.19-rc1 regression - FIGETBSZ returns zero. > I never noticed this regression before, because none of the generic tests > are using filefrag. Funny, I was wondering just the other day if there were any filesystems that set s_blocksize == 0... :) --D > Thanks, > Amir.
next prev parent reply other threads:[~2018-10-11 15:55 UTC|newest] Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-10-11 4:12 [PATCH v3 00/25] fs: fixes for serious clone/dedupe problems Darrick J. Wong 2018-10-11 4:12 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:12 ` [PATCH 01/25] xfs: add a per-xfs trace_printk macro Darrick J. Wong 2018-10-11 4:12 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 13:39 ` Christoph Hellwig 2018-10-11 13:39 ` [Ocfs2-devel] " Christoph Hellwig 2018-10-11 23:34 ` Darrick J. Wong 2018-10-11 23:34 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:12 ` [PATCH 02/25] vfs: vfs_clone_file_prep_inodes should return EINVAL for a clone from beyond EOF Darrick J. Wong 2018-10-11 4:12 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 13:40 ` Christoph Hellwig 2018-10-11 13:40 ` [Ocfs2-devel] " Christoph Hellwig 2018-10-11 4:12 ` [PATCH 03/25] vfs: check file ranges before cloning files Darrick J. Wong 2018-10-11 4:12 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 13:42 ` Christoph Hellwig 2018-10-11 13:42 ` [Ocfs2-devel] " Christoph Hellwig 2018-10-11 14:13 ` Amir Goldstein 2018-10-11 4:12 ` [PATCH 04/25] vfs: strengthen checking of file range inputs to generic_remap_checks Darrick J. Wong 2018-10-11 4:12 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 13:43 ` Christoph Hellwig 2018-10-11 13:43 ` [Ocfs2-devel] " Christoph Hellwig 2018-10-11 4:12 ` [PATCH 05/25] vfs: avoid problematic remapping requests into partial EOF block Darrick J. Wong 2018-10-11 4:12 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-12 0:16 ` Dave Chinner 2018-10-12 0:16 ` [Ocfs2-devel] " Dave Chinner 2018-10-12 16:07 ` Darrick J. Wong 2018-10-12 16:07 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-12 20:22 ` Filipe Manana 2018-10-12 20:22 ` Filipe Manana 2018-10-15 0:31 ` Dave Chinner 2018-10-15 0:31 ` [Ocfs2-devel] " Dave Chinner 2018-11-02 12:04 ` Filipe Manana 2018-11-02 12:04 ` Filipe Manana 2018-11-02 17:42 ` Darrick J. Wong 2018-11-02 17:42 ` Darrick J. Wong 2018-11-02 17:42 ` [Ocfs2-devel] " Darrick J. Wong 2018-11-02 18:18 ` Filipe Manana 2018-11-02 19:05 ` Filipe Manana 2018-10-11 4:13 ` [PATCH 06/25] vfs: skip zero-length dedupe requests Darrick J. Wong 2018-10-11 4:13 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:13 ` [PATCH 07/25] vfs: combine the clone and dedupe into a single remap_file_range Darrick J. Wong 2018-10-11 4:13 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:13 ` [PATCH 08/25] vfs: rename vfs_clone_file_prep to be more descriptive Darrick J. Wong 2018-10-11 4:13 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:13 ` [PATCH 09/25] vfs: rename clone_verify_area to remap_verify_area Darrick J. Wong 2018-10-11 4:13 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:13 ` [PATCH 10/25] vfs: create generic_remap_file_range_touch to update inode metadata Darrick J. Wong 2018-10-11 4:13 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:13 ` [PATCH 11/25] vfs: pass remap flags to generic_remap_file_range_prep Darrick J. Wong 2018-10-11 4:13 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:13 ` [PATCH 12/25] vfs: pass remap flags to generic_remap_checks Darrick J. Wong 2018-10-11 4:13 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:13 ` [PATCH 13/25] vfs: make remap_file_range functions take and return bytes completed Darrick J. Wong 2018-10-11 4:13 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:14 ` [PATCH 14/25] vfs: plumb RFR_* remap flags through the vfs clone functions Darrick J. Wong 2018-10-11 4:14 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:14 ` [PATCH 15/25] vfs: plumb RFR_* remap flags through the vfs dedupe functions Darrick J. Wong 2018-10-11 4:14 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:14 ` [PATCH 16/25] vfs: make remapping to source file eof more explicit Darrick J. Wong 2018-10-11 4:14 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:14 ` [PATCH 17/25] vfs: enable remap callers that can handle short operations Darrick J. Wong 2018-10-11 4:14 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 5:15 ` Amir Goldstein 2018-10-11 16:04 ` Darrick J. Wong 2018-10-11 16:04 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 16:05 ` [PATCH v2 " Darrick J. Wong 2018-10-11 16:05 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:14 ` [PATCH 18/25] vfs: hide file range comparison function Darrick J. Wong 2018-10-11 4:14 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:14 ` [PATCH 19/25] vfs: implement opportunistic short dedupe Darrick J. Wong 2018-10-11 4:14 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:14 ` [PATCH 20/25] ocfs2: truncate page cache for clone destination file before remapping Darrick J. Wong 2018-10-11 4:14 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:14 ` [PATCH 21/25] ocfs2: fix pagecache truncation prior to reflink Darrick J. Wong 2018-10-11 4:14 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:15 ` [PATCH 22/25] ocfs2: support partial clone range and dedupe range Darrick J. Wong 2018-10-11 4:15 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:15 ` [PATCH 23/25] xfs: fix pagecache truncation prior to reflink Darrick J. Wong 2018-10-11 4:15 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-12 1:15 ` Dave Chinner 2018-10-12 1:15 ` [Ocfs2-devel] " Dave Chinner 2018-10-11 4:15 ` [PATCH 24/25] xfs: support returning partial reflink results Darrick J. Wong 2018-10-11 4:15 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-12 1:22 ` Dave Chinner 2018-10-12 1:22 ` [Ocfs2-devel] " Dave Chinner 2018-10-12 16:06 ` Darrick J. Wong 2018-10-12 16:06 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-11 4:15 ` [PATCH 25/25] xfs: remove redundant remap partial EOF block checks Darrick J. Wong 2018-10-11 4:15 ` [Ocfs2-devel] " Darrick J. Wong 2018-10-12 1:22 ` Dave Chinner 2018-10-12 1:22 ` [Ocfs2-devel] " Dave Chinner 2018-10-11 8:33 ` [PATCH v3 00/25] fs: fixes for serious clone/dedupe problems Amir Goldstein 2018-10-11 15:55 ` Darrick J. Wong [this message] 2018-10-11 15:55 ` [Ocfs2-devel] " Darrick J. Wong
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20181011155504.GZ28243@magnolia \ --to=darrick.wong@oracle.com \ --cc=amir73il@gmail.com \ --cc=david@fromorbit.com \ --cc=linux-btrfs@vger.kernel.org \ --cc=linux-cifs@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-nfs@vger.kernel.org \ --cc=linux-unionfs@vger.kernel.org \ --cc=linux-xfs@vger.kernel.org \ --cc=ocfs2-devel@oss.oracle.com \ --cc=sandeen@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.