Re: Instrumentation and RCU

From: Masami Hiramatsu <mhiramat@kernel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Alexei Starovoitov <ast@kernel.org>, paulmck <paulmck@kernel.org>,
	"Joel Fernandes, Google" <joel@joelfernandes.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Jason Wessel <jason.wessel@windriver.com>
Subject: Re: Instrumentation and RCU
Date: Wed, 11 Mar 2020 16:48:26 +0900	[thread overview]
Message-ID: <20200311164826.0e7eab4bcf7d58b922d59ae9@kernel.org> (raw)
In-Reply-To: <831351096.24668.1583887061530.JavaMail.zimbra@efficios.com>

On Tue, 10 Mar 2020 20:37:41 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- On Mar 10, 2020, at 8:18 PM, Masami Hiramatsu mhiramat@kernel.org wrote:
> [...]
>  
> >> An approach where the "in_tracer" flag is tested and set by the instrumentation
> >> (function tracer, kprobes, tracepoints) would work here. Let's say the beginning
> >> of the int3 ISR is part of the code which is invisible to instrumentation, and
> >> before we issue rcu_nmi_enter(), we handle the in_tracer flag:
> >> 
> >> rcu_nmi_enter();
> >>  <int3>
> >>     (recursion_ctx->in_tracer == false)
> >>     set recursion_ctx->in_tracer = true
> >>     do_int3() {
> >>        rcu_nmi_enter();
> >>          <int3>
> >>             if (recursion_ctx->in_tracer == true)
> >>                 iret
> >> 
> >> We can change "in_tracer" for "in_breakpoint", "in_tracepoint" and
> >> "in_function_trace" if we ever want to allow different types of instrumentation
> >> to nest. I'm not sure whether this is useful or not through.
> > 
> > Kprobes already has its own "in_kprobe" flag, and the recursion path is
> > not so simple. Since the int3 replaces the original instruction, we have to
> > execute the original instruction with single-step and fixup.
> > 
> > This means it involves do_debug() too. Thus, we can not do iret directly
> > from do_int3 like above, but if recursion happens, we have no way to
> > recover to origonal execution path (and call BUG()).
> 
> I think that all the code involved when hitting a breakpoint which would
> be the minimal subset required to act as if the kprobe was not there in the
> first place (single-step, fixup) should be hidden from kprobes
> instrumentation. I suspect this is the current intent today with noprobe
> annotations, but Thomas' proposal brings this a step further.
> 
> However, any other kprobe code (and tracer callbacks) beyond that
> minimalistic "effect-less" kprobe could be protected by a
> per-recursion-context in_kprobe flag.

Would you mean "in_kprobe" flag will prevent recursive execution of
kprobes but not prevent other tracer like tracepoint? If so, it is
already done I think. As I pointed, kprobe itself has in_kprobe like
flag for checking re-entrance. Thus the kprobe handler can call the
function which has a tracepoint safely.

Anyway, I agree with you to port all kprobe int3/debug handling parts
to the effect-less (offlimit) area, except for its pre/post handlers.

> > As my previous email, I showed a patch which is something like
> > "bust_kprobes()" for oops path. That is not safe but no other way to escape
> > from this recursion hell. (Maybe we can try to call it instead of calling
> > BUG() so that the kernel can continue to run, but I'm not sure we can
> > safely make the pagetable to readonly again.)
> 
> As long as we provide a minimalistic "effect-less" kprobe implementation
> in a non-instrumentable section which can be used whenever we are in a
> recursion scenario, I think we could achieve something recursion-free without
> requiring a bust_kprobes() work-around.

Yeah, I hope so. The bust_kprobes() is something like an emergency escape
hammer which everyone hopes never be used :)

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>