linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Beau Belgrave <beaub@linux.microsoft.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-trace-kernel@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>, bpf <bpf@vger.kernel.org>,
	David Vernet <void@manifault.com>,
	dthaler@microsoft.com, brauner@kernel.org, hch@infradead.org
Subject: Re: [PATCH] tracing/user_events: Run BPF program if attached
Date: Tue, 16 May 2023 18:46:29 -0700	[thread overview]
Message-ID: <CAHk-=wj1hh=ZUriY9pVFvD1MjqbRuzHc4yz=S2PCW7u3W0-_BQ@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=whBKoovtifU2eCeyuBBee-QMcbxdXDLv0mu0k2DgxiaOw@mail.gmail.com>

On Tue, May 16, 2023 at 5:56 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> That code depends on all the usual MM locking, and it does not work at
> all in the same way that "pagefault_disable()" does, for example.

Note that in interrupt context, the page table locking would also
deadlock, so even though it's using spinlocks and that _could_ be
atomic, it really isn't usable in interrupt (much less NMI) context.

And even if you're in process context, but atomic due to other reasons
(ie spinlocks or RCU), while the page table locking would be ok, the
mm locking isn't.

So FOLL_NOFAULT really is about avoiding faulting in any new pages,
and it's kind of like "GFP_NOIO" in that respect: it's meant for
filesystems etc that can not afford to fault, because they may hold
locks, and a page fault could then recurse back into the filesystem
and deadlock.

It's not about atomicity or anything like that.

Similarly, FOLL_NOWAIT is absolutely not about that - it will actually
start to fault things in, it just won't then wait for the IO to
complete (so think "don't wait for IO" rather than any "avoid
locking").

Anyway, the end result is the one I already mentioned: only
"get_user_page[s]_fast_only()" is about being usable in atomic
context, and that only works on the *current* process.

That one really is designed to be kind of like "pagefault_disable()",
except instead of fetching user data, it fetches the "reference" to
the user datat (ie the pointers to the underlying pages).

In fact, the very reason that *one* GUP function works in atomic
context is pretty much the exact same reason that you can try to fault
in user pages under pagefault_disable(): it doesn't actually use any
of the generic VM data structures, and accesses the page tables
directly. Exactly like the hardware does.

So don't even ask for other GUP functionality, much less the "remote"
kind. Not going to happen. If you think you need access to remote
process memory, you had better do it in process context, or you had
better just think again.

               Linus

  reply	other threads:[~2023-05-17  1:46 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-08 16:37 [PATCH] tracing/user_events: Run BPF program if attached Beau Belgrave
2023-05-09 15:24 ` Alexei Starovoitov
2023-05-09 17:01   ` Steven Rostedt
2023-05-09 20:30     ` Steven Rostedt
2023-05-09 20:42       ` Steven Rostedt
2023-05-15 16:57       ` Alexei Starovoitov
2023-05-15 18:33         ` Steven Rostedt
2023-05-15 19:35           ` Beau Belgrave
2023-05-15 21:38             ` Steven Rostedt
2023-05-15 19:24         ` Beau Belgrave
2023-05-15 21:57           ` Steven Rostedt
2023-05-17  0:36           ` Alexei Starovoitov
2023-05-17  0:56             ` Linus Torvalds
2023-05-17  1:46               ` Linus Torvalds [this message]
2023-05-17  2:29                 ` Steven Rostedt
2023-05-17  3:03                   ` Linus Torvalds
2023-05-17 17:22                     ` Beau Belgrave
2023-05-17 18:15                       ` Linus Torvalds
2023-05-17 19:07                         ` Beau Belgrave
2023-05-17 19:26                           ` Linus Torvalds
2023-05-17 19:36                             ` Beau Belgrave
2023-05-17 19:36                             ` Linus Torvalds
2023-05-17 19:37                               ` Linus Torvalds
2023-05-17 23:00                                 ` Beau Belgrave
2023-05-17 23:14                                   ` Linus Torvalds
2023-05-17 23:25                                     ` Steven Rostedt
2023-05-18  0:14                                       ` Beau Belgrave
2023-05-18  0:23                                         ` Linus Torvalds
2023-05-17 20:08                               ` Linus Torvalds
2023-05-17  1:26             ` Steven Rostedt
2023-05-17 16:50               ` Beau Belgrave
2023-05-18  0:10                 ` Alexei Starovoitov
2023-05-18  0:19                   ` Beau Belgrave
2023-05-18  0:56                     ` Alexei Starovoitov
2023-05-18  1:18                       ` Beau Belgrave
2023-05-18  2:08                         ` Steven Rostedt
2023-05-18  3:14                           ` Alexei Starovoitov
2023-05-18 13:36                             ` Steven Rostedt
2023-05-18 17:28                               ` Beau Belgrave
2023-06-01  9:46                   ` Christian Brauner
2023-06-01 15:24                     ` Beau Belgrave
2023-06-01 15:57                       ` Christian Brauner
2023-06-01 16:29                         ` Beau Belgrave
2023-06-06 13:37                           ` Masami Hiramatsu
2023-06-06 17:05                             ` Beau Belgrave
2023-06-07 14:07                               ` Masami Hiramatsu
2023-06-07 19:26                                 ` Beau Belgrave
2023-06-08  0:25                                   ` Masami Hiramatsu
2023-05-17 17:51             ` Beau Belgrave
2023-06-06 13:57             ` Masami Hiramatsu
2023-06-06 16:57               ` Andrii Nakryiko
2023-06-06 20:57                 ` Beau Belgrave

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wj1hh=ZUriY9pVFvD1MjqbRuzHc4yz=S2PCW7u3W0-_BQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=beaub@linux.microsoft.com \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dthaler@microsoft.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).