From: Dave Chinner <firstname.lastname@example.org> To: email@example.com, firstname.lastname@example.org Cc: email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com Subject: [PATCH 0/11] fs: fixes for major copy_file_range() issues Date: Mon, 3 Dec 2018 19:34:05 +1100 Message-ID: <firstname.lastname@example.org> (raw) Hi folks, As most of you already know, we really suck at introducing new functionality. The recent problems we found with clone/dedupe file range interfaces also plague the copy_file_range() API and implementation. Not only doesn't it do exactly what the man page says, the man page doesn't document everything the syscal does either. There's a few problems: - can overwrite setuid files - can read from and overwrite active swap files - can overwrite immutable files - doesn't update timestamps - doesn't obey resource limits - doesn't catch overlapping copy ranges to the same file - doesn't consistently implement fallback strategies - does error out when the source range extends past EOF like the man page says it should - isn't consistent with clone file range behaviour - inconsistent behaviour between filesystems - inconsistent fallback implementations And so on. There's so much wrong, and I haven't even got to the problems that the generic fallback code (i.e. do_splice_direct() has). That's for another day. So, what this series attempts to do is clean up the code, implement all the missing checks, provide an infrastructure layout that allows for consistent behaviour across filesystems and allows filesysetms to control fallback mechanisms and cross-device copies. I'll repeat that so it's clear: the series also enabled cross-device copies once all the problems are sorted out. To that end, the current fallback code is moved to generic_copy_file_range(), and that is called only if the filesystem does not provide a ->copy_file_range implementation. If the filesystem provides such a method, itmust implement the page cache copy fallback itself by calling generic_copy_file_range() when appropriate. I did this because different filesystems have different copy-offload capabilities and so need to fall back in different situations. It's easier to have them call generic_copy_file_range() to do that copy when necessary than it is to have them try to communicate back up to vfs_copy_file_range() that it should run a fallback copy. To make all the implementations perform the same validity checks, I've created a generic_copy_file_checks() which is similar to the checks we do for clone/dedupe. It's not quite the same, but the core is very similar. This strips setuid, updates timestamps, checks and enforces filesystem and resource limits, bounds checks the copy ranges, etc. This needs to be run before we call ->remap_file_range() so that we end up with consistent behaviour across copy_file_range() calls. e.g. we want an XFS filesystem with reflink=1 (i.e. supports ->remap_file_range()) to behave the same as an XFS filesystem with reflink=0. Hence we need to check all the parameters up front so we don't end up with calls to ->remap_file_range() resulting in different behaviour. It also means that ->copy_file_range implementations only need to bounds checking the input against fileystem internal constraints, not everything. This makes the filesystem implementations simpler, and means they can call the falloback generic_copy_file_range() implementation without having to care about further bounds checking. I have not changed the fallback behaviour of the CIFS, Ceph or NFS client implementations. The still reject copy_file_range() to the same file with EINVAL, even though it is supported by the fallback and filesystems that implement ->remap_file_range(). I'll leave it for the maintainers to decide if they want to implement the manual data copy fallback or not. My personal opinion is that they should implement the fallback where-ever they can, but userspace has to be prepared for copy_file_range() to fail and so implementing the fallback is an optional feature. In terms of testing, Darrick and I have been beating the hell out of copy_file_range with fsx on XFS to sort out all the data corruption problems it has exposed (we're still working on that). Patches have been posted to enhance fsx and fsstress in fstests to exercise clone/dedupe/copy_file_range. Thread here: https://www.spinics.net/lists/fstests/msg10920.html I've also written a bounds/behaviour exercising test: https://marc.info/?l=fstests&m=154381938829897&w=2 https://marc.info/?l=fstests&m=154381939029898&w=2 https://marc.info/?l=fstests&m=154381939229899&w=2 https://marc.info/?l=fstests&m=154381939329900&w=2 I don't know whether I've got all the permission tests right in this patchset. There's absolutely no documentation telling us when we should use file_permission, inode_permission, etc in the documentation or the code, so I just added the things that made the tests do the things i think are the right things to be doing. To run the tests, you'll also need modifications to xfs_io to allow it to modify state appropriately. This is something we have overlooked in the past, and so a lots of xfs_io based behaviour checking is not actually testing the syscall we thought it was testing but is instead testing the permission checking of the open() syscall. Those patches are here: https://marc.info/?l=linux-xfs&m=154378403323889&w=2 https://marc.info/?l=linux-xfs&m=154378403523890&w=2 https://marc.info/?l=linux-xfs&m=154378403323888&w=2 https://marc.info/?l=linux-xfs&m=154379644526132&w=2 These changes really need to go in before we merge any more copy_file_range() features - we need to get the basics right and get test coverage over it before we unleash things like NFS server-side copies on unsuspecting users with filesystems that have busted copy_file_range() implementations. I'll be appending a man page patch to this series that documents all the errors this syscall can throw, the expected behaviours, etc. The test and the man page were written together first, and the implementation changes were done second. So if you don't agree with the behaviour, discuss what the man page patch should say and define, then I'll change the test to reflect that and I'll go from there. -Dave.
next reply index Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-12-03 8:34 Dave Chinner [this message] 2018-12-03 8:34 ` [PATCH 01/11] vfs: copy_file_range source range over EOF should fail Dave Chinner 2018-12-03 12:46 ` Amir Goldstein 2018-12-04 15:13 ` Christoph Hellwig 2018-12-04 21:29 ` Dave Chinner 2018-12-04 21:47 ` Olga Kornievskaia 2018-12-04 22:31 ` Dave Chinner 2018-12-05 16:51 ` bfields 2019-05-20 9:10 ` Amir Goldstein 2019-05-20 13:12 ` Olga Kornievskaia 2019-05-20 13:36 ` Amir Goldstein 2019-05-20 13:58 ` Olga Kornievskaia 2019-05-20 14:02 ` Amir Goldstein 2018-12-05 14:12 ` Christoph Hellwig 2018-12-05 21:08 ` Dave Chinner 2018-12-05 21:30 ` Christoph Hellwig 2018-12-03 8:34 ` [PATCH 02/11] vfs: introduce generic_copy_file_range() Dave Chinner 2018-12-03 10:03 ` Amir Goldstein 2018-12-03 23:00 ` Dave Chinner 2018-12-04 15:14 ` Christoph Hellwig 2018-12-03 8:34 ` [PATCH 03/11] vfs: no fallback for ->copy_file_range Dave Chinner 2018-12-03 10:22 ` Amir Goldstein 2018-12-03 23:02 ` Dave Chinner 2018-12-06 4:16 ` Amir Goldstein 2018-12-06 21:30 ` Dave Chinner 2018-12-07 5:38 ` Amir Goldstein 2018-12-03 18:23 ` Anna Schumaker 2018-12-04 15:16 ` Christoph Hellwig 2018-12-03 8:34 ` [PATCH 04/11] vfs: add missing checks to copy_file_range Dave Chinner 2018-12-03 12:42 ` Amir Goldstein 2018-12-03 19:04 ` Darrick J. Wong 2018-12-03 21:33 ` Olga Kornievskaia 2018-12-03 23:04 ` Dave Chinner 2018-12-04 15:18 ` Christoph Hellwig 2018-12-12 11:31 ` Luis Henriques 2018-12-12 16:42 ` Darrick J. Wong 2018-12-12 18:55 ` Olga Kornievskaia 2018-12-12 19:42 ` Matthew Wilcox 2018-12-12 20:22 ` Olga Kornievskaia 2018-12-13 10:29 ` Luis Henriques 2018-12-03 8:34 ` [PATCH 05/11] vfs: use inode_permission in copy_file_range() Dave Chinner 2018-12-03 12:47 ` Amir Goldstein 2018-12-03 18:18 ` Darrick J. Wong 2018-12-03 23:55 ` Dave Chinner 2018-12-05 17:28 ` bfields 2018-12-03 18:53 ` Eric Biggers 2018-12-04 15:19 ` Christoph Hellwig 2018-12-03 8:34 ` [PATCH 06/11] vfs: copy_file_range needs to strip setuid bits Dave Chinner 2018-12-03 12:51 ` Amir Goldstein 2018-12-04 15:21 ` Christoph Hellwig 2018-12-03 8:34 ` [PATCH 07/11] vfs: copy_file_range should update file timestamps Dave Chinner 2018-12-03 10:47 ` Amir Goldstein 2018-12-03 17:33 ` Olga Kornievskaia 2018-12-03 18:22 ` Darrick J. Wong 2018-12-03 23:19 ` Dave Chinner 2018-12-04 15:24 ` Christoph Hellwig 2018-12-03 8:34 ` [PATCH 08/11] vfs: push EXDEV check down into ->remap_file_range Dave Chinner 2018-12-03 11:04 ` Amir Goldstein 2018-12-03 19:11 ` Darrick J. Wong 2018-12-03 23:37 ` Dave Chinner 2018-12-03 23:58 ` Darrick J. Wong 2018-12-04 9:17 ` Amir Goldstein 2018-12-03 23:34 ` Dave Chinner 2018-12-03 18:24 ` Darrick J. Wong 2018-12-04 8:18 ` Olga Kornievskaia 2018-12-03 8:34 ` [PATCH 09/11] vfs: push copy_file_ranges -EXDEV checks down Dave Chinner 2018-12-03 12:36 ` Amir Goldstein 2018-12-03 17:58 ` Olga Kornievskaia 2018-12-03 18:53 ` Anna Schumaker 2018-12-03 19:27 ` Olga Kornievskaia 2018-12-03 23:40 ` Dave Chinner 2018-12-04 15:43 ` Christoph Hellwig 2018-12-04 22:18 ` Dave Chinner 2018-12-04 23:33 ` Olga Kornievskaia 2018-12-05 14:09 ` Christoph Hellwig 2018-12-05 17:01 ` Olga Kornievskaia 2018-12-03 8:34 ` [PATCH 10/11] vfs: allow generic_copy_file_range to copy across devices Dave Chinner 2018-12-03 12:54 ` Amir Goldstein 2018-12-03 8:34 ` [PATCH 11/11] ovl: allow cross-device copy_file_range calls Dave Chinner 2018-12-03 12:55 ` Amir Goldstein 2018-12-03 8:39 ` [PATCH 12/11] man-pages: copy_file_range updates Dave Chinner 2018-12-03 13:05 ` Amir Goldstein 2019-05-21 5:52 ` Amir Goldstein
Reply instructions: You may reply publically to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-NFS Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \ firstname.lastname@example.org email@example.com public-inbox-index linux-nfs Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs AGPL code for this site: git clone https://public-inbox.org/ public-inbox