BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Nikolay Borisov <nborisov@suse.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>, bpf <bpf@vger.kernel.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>
Subject: Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")
Date: Fri, 29 Jan 2021 13:05:33 -0800
Message-ID: <20210129210533.7s6udd3vobkgb27u@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <20210129140103.3ce971b7@gandalf.local.home>

On Fri, Jan 29, 2021 at 02:01:03PM -0500, Steven Rostedt wrote:
> On Fri, 29 Jan 2021 18:59:43 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> > On Fri, Jan 29, 2021 at 09:45:48AM -0800, Alexei Starovoitov wrote:
> > > Same things apply to bpf side. We can statically prove safety for
> > > ftrace and kprobe attaching whereas to deal with NMI situation we
> > > have to use run-time checks for recursion prevention, etc.  
> > 
> > I have no idea what you're saying. You can attach to functions that are
> > called with random locks held, you can create kprobes in some very
> > sensitive places.
> > 
> > What can you staticlly prove about that?
> I think the main difference is, if you attach a kprobe or ftrace function,
> you can theoretically analyze the location before you do the attachment.

When we're writing bpf helpers we need to carefully think about reentrance and NMI.
If the helper looks like:
int nokprobe notrace bpf_something(...)
  // access variables A and B

The implementation can rely on the fact that even if the helper is reentrant
the state of A and B will be consistent. Either both got touched or none.
Only NMI condition we have to worry about, because A could be modified 
without touching B.
If we write it as
int nokprobe bpf_something(...) { ... }
that would be another case.
Here we need to consider the case that bpf prog can be attached to it via fentry nop.
But no need to worry about instructions split in the middle because of kprobe via int3.
Since we have big "if (in_nmi()) goto unsupported;" check in the beginning we
only need to worry about combinations of kprobe at the start of the func,
kprobe anywhere inside the func via int3, and ftrace at the start.
Not having to think of NMI helps a ton.
My earlier this_cpu vs __this_cpu comment is an example of that.
If in_nmi is filtered early it's one implementation. If nmi has to be handled
it's completely different algorithm.
Now you've broke all this logic by making int3 to be marked as 'in_nmi' and
bpf in kprobe in the middle of the func are now broken.
Do people use that? Yeah they do.
We have to fix it.
What were your reasons to make int3 in_nmi?
I've read the commit log, but I don't see in it the actual motivation
for int3 other than "it looks like NMI to me. Let's make it so".
The commit logs talk about cpu exceptions. I agree that #DB and #MC do behave like NMI.
But #BP is not really. My understanding it's used by kprobes and text_poke_bp only.
If the motivation was to close some issue with text_poke_bp then, sure,
let's make handling of text_poke_bp to be treated as nmi.
But kprobe is not that.
I'm thinking either of the following solutions would be generic:
- introduce another state to preempt flags like "kernel exception"
- remove kprobe's int3 from in_nmi
As bpf specific alternative we can do:
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 6c0018abe68a..37cc549ad52e 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -96,7 +96,7 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
        unsigned int ret;

-       if (in_nmi()) /* not supported yet */
+       if (in_nmi() && !kprobe_running()) /* not supported yet */
                return 1;

but imo it's less clean than the first two options.

  reply index

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <25cd2608-03c2-94b8-7760-9de9935fde64@suse.com>
     [not found] ` <20210128001353.66e7171b395473ef992d6991@kernel.org>
     [not found]   ` <20210128002452.a79714c236b69ab9acfa986c@kernel.org>
     [not found]     ` <a35a6f15-9ab1-917c-d443-23d3e78f2d73@suse.com>
     [not found]       ` <20210128103415.d90be51ec607bb6123b2843c@kernel.org>
2021-01-28  3:38         ` Masami Hiramatsu
2021-01-28  7:11           ` Nikolay Borisov
2021-01-28 16:12           ` Nikolay Borisov
2021-01-28 16:45             ` Nikolay Borisov
2021-01-28 16:50               ` Josh Poimboeuf
2021-01-28 21:52                 ` [PATCH] x86: Disable CET instrumentation in the kernel Josh Poimboeuf
2021-01-29  6:23                   ` Nikolay Borisov
2021-01-29 10:21                   ` Borislav Petkov
     [not found]                     ` <20210129151034.iba4eaa2fuxsipqa@treble>
2021-01-29 16:30                       ` Borislav Petkov
2021-01-29 16:49                         ` Josh Poimboeuf
2021-01-29 16:54                           ` Nikolay Borisov
2021-01-29 17:03                             ` Josh Poimboeuf
2021-01-29 17:07                               ` Borislav Petkov
2021-01-29 17:58                                 ` Seth Forshee
2021-01-28 18:24               ` kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()") Peter Zijlstra
2021-01-29  1:34                 ` Alexei Starovoitov
2021-01-29  6:36                   ` Nikolay Borisov
     [not found]                   ` <YBPNyRyrkzw2echi@hirez.programming.kicks-ass.net>
     [not found]                     ` <20210129224011.81bcdb3eba1227c414e69e1f@kernel.org>
     [not found]                       ` <20210129105952.74dc8464@gandalf.local.home>
2021-01-29 16:24                         ` Peter Zijlstra
2021-01-29 17:45                           ` Alexei Starovoitov
2021-01-29 17:59                             ` Peter Zijlstra
2021-01-29 19:01                               ` Steven Rostedt
2021-01-29 21:05                                 ` Alexei Starovoitov [this message]
2021-01-30  1:41                                   ` Masami Hiramatsu
2021-01-29 21:24                                 ` Steven Rostedt
2021-01-30  8:28                                   ` Peter Zijlstra
2021-01-30 12:44                                     ` Steven Rostedt
2021-02-02 10:45                                       ` Peter Zijlstra
2021-02-02 14:52                                         ` Steven Rostedt
2021-02-02 16:45                                           ` Peter Zijlstra
2021-02-02 16:56                                             ` Steven Rostedt
2021-02-02 18:30                                               ` Peter Zijlstra
2021-02-02 21:05                                                 ` Steven Rostedt
2021-02-03 13:33                                                   ` Masami Hiramatsu
2021-02-03 13:52                                                     ` Steven Rostedt
2021-01-30  2:02                               ` Masami Hiramatsu
2021-01-30  3:08                                 ` Alexei Starovoitov
2021-01-30 12:10                                   ` Masami Hiramatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210129210533.7s6udd3vobkgb27u@ast-mbp.dhcp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=nborisov@suse.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git