ocfs2-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: cluster-devel <cluster-devel@redhat.com>, Jan Kara <jack@suse.cz>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] [PATCH v7 05/19] iov_iter: Introduce fault_in_iov_iter_writeable
Date: Fri, 27 Aug 2021 21:48:55 +0000	[thread overview]
Message-ID: <YSldx9uhMYhT/G8X@zeniv-ca.linux.org.uk> (raw)
In-Reply-To: <YSk+9cTMYi2+BFW7@zeniv-ca.linux.org.uk>

On Fri, Aug 27, 2021 at 07:37:25PM +0000, Al Viro wrote:
> On Fri, Aug 27, 2021 at 12:33:00PM -0700, Linus Torvalds wrote:
> > On Fri, Aug 27, 2021 at 12:23 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > Could you show the cases where "partial copy, so it's OK" behaviour would
> > > break anything?
> > 
> > Absolutely.
> > 
> > For example, i t would cause an infinite loop in
> > restore_fpregs_from_user() if the "buf" argument is a situation where
> > the first page is fine, but the next page is not.
> > 
> > Why? Because __restore_fpregs_from_user() would take a fault, but then
> > fault_in_pages_readable() (renamed) would succeed, so you'd just do
> > that "retry" forever and ever.
> > 
> > Probably there are a number of other places too. That was literally
> > the *first* place I looked at.
> 
> OK...
> 
> Let me dig out the notes from the last time I looked through that area
> and grep around a bit.  Should be about an hour or two.

OK, I've dug it out and rechecked the current mainline.

Call trees:

fault_in_pages_readable()
	kvm_use_magic_page()

Broken, as per mpe.  Relevant part (see <87eeeqa7ng.fsf@mpe.ellerman.id.au> in
your mailbox back in early May for the full story):
|The current code is confused, ie. broken.
...
|We want to check that the mapping succeeded, that the address is
|readable (& writeable as well actually).
...
|diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
...
|-       if (!fault_in_pages_readable((const char *)KVM_MAGIC_PAGE, sizeof(u32))) {
|+       if (get_kernel_nofault(c, (const char *)KVM_MAGIC_PAGE)) {

	[ppc32]swapcontext()
	[ppc32]debug_setcontext()
	[ppc64]swapcontext()

Same situation in all three - it's going to kill the process if copy-in
fails, so it tries to be gentler about it and treat fault-in failures
as -EFAULT from syscall.  AFAICS, it's pointless, but I would like
comments from ppc folks.  Note that bogus *contents* of the
struct ucontext passed by user is almost certainly going to end up
with segfault; trying to catch the cases when bogus address happens
to point someplace unreadable is rather useless in that situation.

	restore_fpregs_from_user()
The one you've caught; hadn't been there last time I'd checked (back in
April).  Its counterpart in copy_fpstate_to_sigframe() had been, though.

	armada_gem_pwrite_ioctl()
Pointless, along with the access_ok() there - it does copy_from_user()
on that area shortly afterwards and failure of either is not a fast path.
	copy_page_from_iter_iovec()
Will do the right thing on short copy of any kind; we are fine with either
semantics.
	iov_iter_fault_in_readable()
		generic_perform_write()
Any short copy that had not lead to progress (== rejected by ->write_end())
will lead to next chunk shortened accordingly, so ->write_begin() would be
asked to prepare for the amount we expect to be able to copy; ->write_end()
should be fine with that.  Failure to copy anything at all (possible due to
eviction on memory pressure, etc.) leads to retry of the same chunk as the
last time, and that's where we rely on fault-in rejecting "nothing could be
faulted in" case.  That one is fine with partial fault-in reported as success.
		f2fs_file_write_iter()
Odd prealloc-related stuff.  AFAICS, from the correctness POV either variant
of semantics would do, but I'm not sure how if either is the right match
to what they are trying to do there.
		fuse_fill_write_pages()
Similar to generic_perform_write() situation, only simpler (no ->write_end()
counterpart there).  All we care about is failure if nothing could be faulted
in.
		btrfs_buffered_write()
Again, similar to generic_perform_write().  More convoluted (after a short
copy it switches to going page-by-page and getting destination pages uptodate,
which will be equivalent to ->write_end() always accepting everything it's
given from that point on), but it's the same "we care only about failure
to fault in the first page" situation.
		ntfs_perform_write()
Another generic_perform_write() analogue.  Same situation wrt fault-in
semantics.
		iomap_write_actor()
Another generic_perform_write() relative.  Same situation.


fault_in_pages_writeable()
        copy_fpstate_to_sigframe()
Same kind of "retry everything from scratch on short copy" as in the other
fpu/signal.c case.
	[btrfs]search_ioctl()
Broken with memory poisoning, for either variant of semantics.  Same for
arm64 sub-page permission differences, I think.
	copy_page_to_iter_iovec()
Will do the right thing on short copy of any kind; we are fine with either
semantics.

So we have 3 callers where we want all-or-nothing semantics - two in
arch/x86/kernel/fpu/signal.c and one in btrfs.  HWPOISON will be a problem
for all 3, AFAICS...

IOW, it looks like we have two different things mixed here - one that wants
to try and fault stuff in, with callers caring only about having _something_
faulted in (most of the users) and one that wants to make sure we *can* do
stores or loads on each byte in the affected area.

Just accessing a byte in each page really won't suffice for the second kind.
Neither will g-u-p use, unless we teach it about HWPOISON and other fun
beasts...  Looks like we want that thing to be a separate primitive; for
btrfs I'd probably replace fault_in_pages_writeable() with clear_user()
as a quick fix for now...

Comments?

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

  reply	other threads:[~2021-08-27 21:54 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-27 16:49 [Ocfs2-devel] [PATCH v7 00/19] gfs2: Fix mmap + page fault deadlocks Andreas Gruenbacher
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 01/19] iov_iter: Fix iov_iter_get_pages{, _alloc} page fault return value Andreas Gruenbacher
2021-09-09 11:09   ` Christoph Hellwig
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 02/19] powerpc/kvm: Fix kvm_use_magic_page Andreas Gruenbacher
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 03/19] gup: Turn fault_in_pages_{readable, writeable} into fault_in_{readable, writeable} Andreas Gruenbacher
2021-08-27 19:08   ` Al Viro
2021-09-03 14:56   ` Filipe Manana
2021-09-28 15:02     ` Andreas Gruenbacher
2021-09-28 16:37       ` Matthew Wilcox
2021-09-28 20:41         ` Andreas Gruenbacher
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 04/19] iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable Andreas Gruenbacher
2021-08-27 18:53   ` Al Viro
2021-08-27 18:57     ` Linus Torvalds
2021-08-27 19:16       ` Al Viro
2021-08-27 20:56   ` Kari Argillander
2021-08-28 17:13     ` Linus Torvalds
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 05/19] iov_iter: Introduce fault_in_iov_iter_writeable Andreas Gruenbacher
2021-08-27 18:49   ` Al Viro
2021-08-27 19:05     ` Linus Torvalds
2021-08-27 19:23       ` Al Viro
2021-08-27 19:33         ` Linus Torvalds
2021-08-27 19:37           ` Al Viro
2021-08-27 21:48             ` Al Viro [this message]
2021-08-27 21:57               ` Al Viro
2021-08-27 23:22                 ` Luck, Tony
2021-08-28  2:20                   ` Luck, Tony
2021-08-28 21:47                   ` Thomas Gleixner
2021-08-28 22:04                     ` Al Viro
2021-08-28 22:11                       ` Al Viro
2021-08-28 22:19                         ` Al Viro
2021-08-28 22:51                           ` Al Viro
2021-08-29 18:44                             ` Thomas Gleixner
2021-08-29 19:46                               ` Al Viro
2021-08-29 19:51                                 ` Thomas Gleixner
2021-08-28 22:20                         ` Tony Luck
2021-08-29  1:40                           ` Matthew Wilcox
2021-08-30 15:41                             ` Luck, Tony
2021-08-28 22:23                       ` Thomas Gleixner
2021-08-28 19:28               ` [Ocfs2-devel] [RFC][arm64] possible infinite loop in btrfs search_ioctl() Al Viro
2021-08-31 13:54                 ` Catalin Marinas
2021-08-31 15:28                   ` Al Viro
2021-08-31 16:01                     ` Catalin Marinas
2021-10-11 17:37                     ` Catalin Marinas
2021-10-11 19:15                       ` Linus Torvalds
2021-10-11 21:08                         ` Catalin Marinas
2021-10-11 23:59                           ` Linus Torvalds
2021-10-12 17:27                             ` Catalin Marinas
2021-10-12 17:58                               ` Linus Torvalds
2021-10-18 17:13                                 ` Catalin Marinas
2021-10-21  0:46                             ` Andreas Gruenbacher
2021-10-21 10:05                               ` Catalin Marinas
2021-10-21 14:42                                 ` Andreas Gruenbacher
2021-10-21 17:09                                   ` Catalin Marinas
2021-10-21 18:00                                     ` Andreas Gruenbacher
2021-10-22 18:41                                       ` Catalin Marinas
2021-10-25 19:37                                         ` Andreas Gruenbacher
2021-10-22  2:30                                   ` Linus Torvalds
2021-10-22  9:34                                     ` Catalin Marinas
2021-08-29  0:58               ` [Ocfs2-devel] [PATCH v7 05/19] iov_iter: Introduce fault_in_iov_iter_writeable Al Viro
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 06/19] gfs2: Add wrapper for iomap_file_buffered_write Andreas Gruenbacher
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 07/19] gfs2: Clean up function may_grant Andreas Gruenbacher
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 08/19] gfs2: Eliminate vestigial HIF_FIRST Andreas Gruenbacher
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 09/19] gfs2: Remove redundant check from gfs2_glock_dq Andreas Gruenbacher
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 10/19] gfs2: Introduce flag for glock holder auto-demotion Andreas Gruenbacher
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 11/19] gfs2: Move the inode glock locking to gfs2_file_buffered_write Andreas Gruenbacher
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 12/19] gfs2: Eliminate ip->i_gh Andreas Gruenbacher
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 13/19] gfs2: Fix mmap + page fault deadlocks for buffered I/O Andreas Gruenbacher
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 14/19] iomap: Fix iomap_dio_rw return value for user copies Andreas Gruenbacher
2021-09-03 18:54   ` Darrick J. Wong
2021-09-09 11:17   ` Christoph Hellwig
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 15/19] iomap: Support partial direct I/O on user copy failures Andreas Gruenbacher
2021-09-03 18:54   ` Darrick J. Wong
2021-09-09 11:20   ` Christoph Hellwig
2021-09-28 15:05     ` Andreas Gruenbacher
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 16/19] iomap: Add done_before argument to iomap_dio_rw Andreas Gruenbacher
2021-08-27 18:30   ` Darrick J. Wong
2021-08-27 20:15     ` Andreas Gruenbacher
2021-08-27 21:32       ` Darrick J. Wong
2021-08-27 21:49         ` Andreas Grünbacher
2021-08-27 22:35         ` Linus Torvalds
2021-09-03 18:47           ` Darrick J. Wong
2021-09-03 18:53   ` Darrick J. Wong
2021-09-09 11:30   ` Christoph Hellwig
2021-09-09 17:22     ` Linus Torvalds
2021-09-10  7:36       ` Christoph Hellwig
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 17/19] gup: Introduce FOLL_NOFAULT flag to disable page faults Andreas Gruenbacher
2021-09-09 11:36   ` Christoph Hellwig
2021-09-09 17:17     ` Linus Torvalds
2021-09-10  7:24       ` Christoph Hellwig
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 18/19] iov_iter: Introduce nofault " Andreas Gruenbacher
2021-08-27 18:47   ` Al Viro
2021-08-27 19:56     ` Andreas Gruenbacher
2021-08-27 16:49 ` [Ocfs2-devel] [PATCH v7 19/19] gfs2: Fix mmap + page fault deadlocks for direct I/O Andreas Gruenbacher
2021-08-27 17:16 ` [Ocfs2-devel] [PATCH v7 00/19] gfs2: Fix mmap + page fault deadlocks Linus Torvalds
2021-09-01 19:52   ` Andreas Gruenbacher
2021-09-03 15:52     ` Linus Torvalds
2021-09-03 18:25       ` Al Viro
2021-09-03 18:47         ` Linus Torvalds
2021-09-03 19:51       ` Andreas Grünbacher
2021-09-03 15:07 ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YSldx9uhMYhT/G8X@zeniv-ca.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=agruenba@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).