BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Nikolay Borisov <nborisov@suse.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>, bpf <bpf@vger.kernel.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>
Subject: Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")
Date: Tue, 2 Feb 2021 09:52:49 -0500
Message-ID: <20210202095249.5abd6780@gandalf.local.home> (raw)
In-Reply-To: <YBktVT+z7sV/vEPU@hirez.programming.kicks-ass.net>

On Tue, 2 Feb 2021 11:45:41 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> > The stack tracer checks the size of the stack, compares it to the
> > largest recorded size, and if it's bigger, it will save the stack. But
> > if this happens on two CPUs at the same time, only one can do the
> > recording at the same time. To synchronize this, a spin lock must be
> > taken. Similar to spin locks in an NMI.  
> 
> That sounds like something cmpxchg() should be able to do.
> 
> Have a per-cpu stack trace buffer and a global max one, when cpu local
> exceeds previous max, cmpxchg the buffer.
> 
> > But the problem here is, the callbacks can also be done from an NMI
> > context, so if we are in NMI, we don't want to take any locks, and
> > simply don't record the stack traces from NMIs.  
> 
> Which is obviously shit :-) The NMI might have interesting stack usage.

Actually, it only checks task stacks. It doesn't check IRQ stacks if
they are different than the task stack, because to do it properly, it
must know the size of the stack. The tracer currently masks the stack
pointer with THREAD_SIZE to find the top of the stack.

As other stacks may not be THREAD_SIZE, that won't work. It has been on
my TODO list (for a long time), to add an arch specific way to quickly find
the top of the stack.

> 
> > The more I think about it, the more I hate the idea that ftrace
> > callbacks and kprobes are considered NMIs. Simply because they are not!  
> 
> Yet they happen when IRQs are off, so they are ;-)

But from a handler, you could do:

	if (in_nmi())
		return;
	local_irq_save(flags);
	/* Now you are safe from being re-entrant. */

Where as there's no equivalent in a NMI handler. That's what makes
kprobe/ftrace handlers different than NMI handlers.

> 
> Also, given how everything can nest, it had better all be lockless
> anyway. You can get your regular function trace interrupted, which can
> hit a #DB, which can function trace, which can #BP which can function
> trace again which can get #NMI etc.. Many wonderfun nestings possible.

I would call #DB an #BP handlers very special.

Question: Do #DB and #BP set "in_interrupt()"? Because the function tracer
has infrastructure to prevent recursion in the same context. That is, a
ftrace handler calls something that gets traced, the recursion protection
will detect that and prevent the handler from being called again. But the
recursion protection is interrupt context aware and lets the handler get
called again if the recursion happens from a different context:

func:
   call ftrace_caller
      ftrace_caller:
          call ftrace_handler
              ftrace_handler() {

                   if (recursion_test()) <- false
                       return;

                   some_traced_func() {
                        call ftrace_caller
                               call ftrace_handler
                                    ftrace_handler() {
                                        if (recursion_test()) <- true
                                               return
                                    }
                    <interrupt>
                         func
                            call ftrace_caller
                                 call ftrace_handler
                                       ftrace_handler() {
                                            if (recursion_test()) <- false
                                                 return;
                                       /* continue */


If #DB and #BP do not change the in_interrupt() context, then the above
still will protect the ftrace handlers from recursion due to them.

> 
> And god knows what these handlers end up calling.
> 
> The only sane approach is treating it all as NMI and having it all
> lockless.

That would require refactoring all the code that's been around since 2008.

Worse yet. lockless is much more complex to get right. So this refactoring
will likely cause more bugs than it solves.

-- Steve

  reply index

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <25cd2608-03c2-94b8-7760-9de9935fde64@suse.com>
     [not found] ` <20210128001353.66e7171b395473ef992d6991@kernel.org>
     [not found]   ` <20210128002452.a79714c236b69ab9acfa986c@kernel.org>
     [not found]     ` <a35a6f15-9ab1-917c-d443-23d3e78f2d73@suse.com>
     [not found]       ` <20210128103415.d90be51ec607bb6123b2843c@kernel.org>
2021-01-28  3:38         ` Masami Hiramatsu
2021-01-28  7:11           ` Nikolay Borisov
2021-01-28 16:12           ` Nikolay Borisov
2021-01-28 16:45             ` Nikolay Borisov
2021-01-28 16:50               ` Josh Poimboeuf
2021-01-28 21:52                 ` [PATCH] x86: Disable CET instrumentation in the kernel Josh Poimboeuf
2021-01-29  6:23                   ` Nikolay Borisov
2021-01-29 10:21                   ` Borislav Petkov
     [not found]                     ` <20210129151034.iba4eaa2fuxsipqa@treble>
2021-01-29 16:30                       ` Borislav Petkov
2021-01-29 16:49                         ` Josh Poimboeuf
2021-01-29 16:54                           ` Nikolay Borisov
2021-01-29 17:03                             ` Josh Poimboeuf
2021-01-29 17:07                               ` Borislav Petkov
2021-01-29 17:58                                 ` Seth Forshee
2021-01-28 18:24               ` kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()") Peter Zijlstra
2021-01-29  1:34                 ` Alexei Starovoitov
2021-01-29  6:36                   ` Nikolay Borisov
     [not found]                   ` <YBPNyRyrkzw2echi@hirez.programming.kicks-ass.net>
     [not found]                     ` <20210129224011.81bcdb3eba1227c414e69e1f@kernel.org>
     [not found]                       ` <20210129105952.74dc8464@gandalf.local.home>
2021-01-29 16:24                         ` Peter Zijlstra
2021-01-29 17:45                           ` Alexei Starovoitov
2021-01-29 17:59                             ` Peter Zijlstra
2021-01-29 19:01                               ` Steven Rostedt
2021-01-29 21:05                                 ` Alexei Starovoitov
2021-01-30  1:41                                   ` Masami Hiramatsu
2021-01-29 21:24                                 ` Steven Rostedt
2021-01-30  8:28                                   ` Peter Zijlstra
2021-01-30 12:44                                     ` Steven Rostedt
2021-02-02 10:45                                       ` Peter Zijlstra
2021-02-02 14:52                                         ` Steven Rostedt [this message]
2021-02-02 16:45                                           ` Peter Zijlstra
2021-02-02 16:56                                             ` Steven Rostedt
2021-02-02 18:30                                               ` Peter Zijlstra
2021-02-02 21:05                                                 ` Steven Rostedt
2021-02-03 13:33                                                   ` Masami Hiramatsu
2021-02-03 13:52                                                     ` Steven Rostedt
2021-01-30  2:02                               ` Masami Hiramatsu
2021-01-30  3:08                                 ` Alexei Starovoitov
2021-01-30 12:10                                   ` Masami Hiramatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210202095249.5abd6780@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=nborisov@suse.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git