ocfs2-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: "Andreas Grünbacher" <andreas.gruenbacher@gmail.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: kvm-ppc@vger.kernel.org, Christoph Hellwig <hch@infradead.org>,
	cluster-devel <cluster-devel@redhat.com>, Jan Kara <jack@suse.cz>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Paul Mackerras <paulus@ozlabs.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] [PATCH v8 00/17] gfs2: Fix mmap + page fault deadlocks
Date: Fri, 29 Oct 2021 00:15:55 +0200	[thread overview]
Message-ID: <CAHpGcMLeiXSjCJGY6SCJJ=bdNOspHLHofmTE8aC_sZtfHRG5ZA@mail.gmail.com> (raw)
In-Reply-To: <YXsUNMWFpmT1eQcX@arm.com>

Am Do., 28. Okt. 2021 um 23:21 Uhr schrieb Catalin Marinas
<catalin.marinas@arm.com>:
> One last try on this path before I switch to the other options.
>
> On Wed, Oct 27, 2021 at 02:14:48PM -0700, Linus Torvalds wrote:
> > On Wed, Oct 27, 2021 at 12:13 PM Catalin Marinas
> > <catalin.marinas@arm.com> wrote:
> > > As an alternative, you mentioned earlier that a per-thread fault status
> > > was not feasible on x86 due to races. Was this only for the hw poison
> > > case? I think the uaccess is slightly different.
> >
> > It's not x86-specific, it's very generic.
> >
> > If we set some flag in the per-thread status, we'll need to be careful
> > about not overwriting it if we then have a subsequent NMI that _also_
> > takes a (completely unrelated) page fault - before we then read the
> > per-thread flag.
> >
> > Think 'perf' and fetching backtraces etc.
> >
> > Note that the NMI page fault can easily also be a pointer coloring
> > fault on arm64, for exactly the same reason that whatever original
> > copy_from_user() code was. So this is not a "oh, pointer coloring
> > faults are different". They have the same re-entrancy issue.
> >
> > And both the "pagefault_disable" and "fault happens in interrupt
> > context" cases are also the exact same 'faulthandler_disabled()'
> > thing. So even at fault time they look very similar.
>
> They do look fairly similar but we should have the information in the
> fault handler to distinguish: not a page fault (pte permission or p*d
> translation), in_task(), user address, fixup handler. But I agree the
> logic looks fragile.
>
> I think for nested contexts we can save the uaccess fault state on
> exception entry, restore it on return. Or (needs some thinking on
> atomicity) save it in a local variable. The high-level API would look
> something like:
>
>         unsigned long uaccess_flags;    /* we could use TIF_ flags */
>
>         uaccess_flags = begin_retriable_uaccess();
>         copied = copy_page_from_iter_atomic(...);
>         retry = end_retriable_uaccess(uaccess_flags);
>         ...
>
>         if (!retry)
>                 break;
>
> I think we'd need a TIF flag to mark the retriable region and another to
> track whether a non-recoverable fault occurred. It needs prototyping.
>
> Anyway, if you don't like this approach, I'll look at error codes being
> returned but rather than changing all copy_from_user() etc., introduce a
> new API that returns different error codes depending on the fault
> (e.g -EFAULT vs -EACCES). We already have copy_from_user_nofault(), we'd
> need something for the iov_iter stuff to use in the fs code.

We won't need any of that on the filesystem read and write paths. The
two cases there are buffered and direct I/O:

* In the buffered I/O case, the copying happens with page faults
disabled, at a byte granularity. If that returns a short result, we
need to enable page faults, check if the exact address that failed
still fails (in which case we have a sub-page fault),  fault in the
pages, disable page faults again, and repeat. No probing for sub-page
faults beyond the first byte of the fault-in address is needed.
Functions fault_in_{readable,writeable} implicitly have this behavior;
for fault_in_safe_writeable() the choice we have is to either add
probing of the first byte for sub-page faults to this function or
force callers to do that probing separately. At this point, I'd vote
for the former.

* In the direct I/O case, the copying happens while we're holding page
references, so the only page faults that can occur during copying are
sub-page faults. When iomap_dio_rw or its legacy counterpart is called
with page faults disabled, we need to make sure that the caller can
distinguish between page faults triggered during
bio_iov_iter_get_pages() and during the copying, but that's a separate
problem. (At the moment, when iomap_dio_rw fails with -EFAULT, the
caller *cannot* distinguish between a bio_iov_iter_get_pages failure
and a failure during synchronous copying, but that could be fixed by
returning unique error codes from iomap_dio_rw.)

So as far as I can see, the only problematic case we're left with is
copying bigger than byte-size chunks with page faults disabled when we
don't know whether the underlying pages are resident or not. My guess
would be that in this case, if the copying fails, it would be
perfectly acceptable to explicitly probe the entire chunk for sub-page
faults.

Thanks,
Andreas

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

  parent reply	other threads:[~2021-10-28 22:16 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-19 13:41 [Ocfs2-devel] [PATCH v8 00/17] gfs2: Fix mmap + page fault deadlocks Andreas Gruenbacher
2021-10-19 13:41 ` [Ocfs2-devel] [PATCH v8 01/17] iov_iter: Fix iov_iter_get_pages{, _alloc} page fault return value Andreas Gruenbacher
2021-10-19 13:41 ` [Ocfs2-devel] [PATCH v8 02/17] powerpc/kvm: Fix kvm_use_magic_page Andreas Gruenbacher
2021-10-19 13:41 ` [Ocfs2-devel] [PATCH v8 03/17] gup: Turn fault_in_pages_{readable, writeable} into fault_in_{readable, writeable} Andreas Gruenbacher
2021-10-19 13:41 ` [Ocfs2-devel] [PATCH v8 04/17] iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable Andreas Gruenbacher
2021-10-19 13:41 ` [Ocfs2-devel] [PATCH v8 05/17] iov_iter: Introduce fault_in_iov_iter_writeable Andreas Gruenbacher
2021-10-20 16:25   ` Catalin Marinas
2021-10-19 13:41 ` [Ocfs2-devel] [PATCH v8 06/17] gfs2: Add wrapper for iomap_file_buffered_write Andreas Gruenbacher
2021-10-19 13:41 ` [Ocfs2-devel] [PATCH v8 07/17] gfs2: Clean up function may_grant Andreas Gruenbacher
2021-10-19 13:41 ` [Ocfs2-devel] [PATCH v8 08/17] gfs2: Introduce flag for glock holder auto-demotion Andreas Gruenbacher
2021-10-19 13:41 ` [Ocfs2-devel] [PATCH v8 09/17] gfs2: Move the inode glock locking to gfs2_file_buffered_write Andreas Gruenbacher
2021-10-19 13:41 ` [Ocfs2-devel] [PATCH v8 10/17] gfs2: Eliminate ip->i_gh Andreas Gruenbacher
2021-10-19 13:41 ` [Ocfs2-devel] [PATCH v8 11/17] gfs2: Fix mmap + page fault deadlocks for buffered I/O Andreas Gruenbacher
2021-10-19 13:41 ` [Ocfs2-devel] [PATCH v8 12/17] iomap: Fix iomap_dio_rw return value for user copies Andreas Gruenbacher
2021-10-19 13:42 ` [Ocfs2-devel] [PATCH v8 13/17] iomap: Support partial direct I/O on user copy failures Andreas Gruenbacher
2021-10-19 13:42 ` [Ocfs2-devel] [PATCH v8 14/17] iomap: Add done_before argument to iomap_dio_rw Andreas Gruenbacher
2021-10-19 15:51   ` Darrick J. Wong
2021-10-19 19:30     ` Andreas Gruenbacher
2021-10-20  1:57       ` Darrick J. Wong
2021-10-19 13:42 ` [Ocfs2-devel] [PATCH v8 15/17] gup: Introduce FOLL_NOFAULT flag to disable page faults Andreas Gruenbacher
2021-10-19 13:42 ` [Ocfs2-devel] [PATCH v8 16/17] iov_iter: Introduce nofault " Andreas Gruenbacher
2021-10-19 13:42 ` [Ocfs2-devel] [PATCH v8 17/17] gfs2: Fix mmap + page fault deadlocks for direct I/O Andreas Gruenbacher
2021-10-19 15:40 ` [Ocfs2-devel] [PATCH v8 00/17] gfs2: Fix mmap + page fault deadlocks Linus Torvalds
2021-10-19 16:00   ` Bob Peterson
2021-10-20 16:36   ` Catalin Marinas
2021-10-20 20:11     ` Linus Torvalds
2021-10-20 22:44       ` Catalin Marinas
2021-10-21  6:19         ` Linus Torvalds
2021-10-22 18:06           ` Catalin Marinas
2021-10-22 19:22             ` Linus Torvalds
2021-10-25 19:00               ` Andreas Gruenbacher
2021-10-26 18:24                 ` Catalin Marinas
2021-10-26 18:50                   ` Linus Torvalds
2021-10-26 19:18                     ` Linus Torvalds
2021-10-27 19:13                     ` Catalin Marinas
2021-10-27 21:14                       ` Linus Torvalds
2021-10-28 21:20                         ` Catalin Marinas
2021-10-28 21:40                           ` Catalin Marinas
2021-10-28 22:15                           ` Andreas Grünbacher [this message]
2021-10-29 12:50                             ` Catalin Marinas
2021-10-28 22:32                           ` Linus Torvalds
2021-10-29 17:50                             ` Catalin Marinas
2021-10-29 18:47                               ` Linus Torvalds
2021-10-25 18:24             ` Andreas Gruenbacher
2021-10-26  5:12               ` Theodore Ts'o
2021-10-26  9:44               ` Andreas Gruenbacher
2021-10-27 21:21               ` Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHpGcMLeiXSjCJGY6SCJJ=bdNOspHLHofmTE8aC_sZtfHRG5ZA@mail.gmail.com' \
    --to=andreas.gruenbacher@gmail.com \
    --cc=agruenba@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=cluster-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=paulus@ozlabs.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).