live-patching.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BPF tracing trampoline synchronization between update/freeing and execution?
@ 2020-01-06 16:39 Jann Horn
  2020-01-06 16:56 ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Jann Horn @ 2020-01-06 16:39 UTC (permalink / raw)
  To: bpf, live-patching, Alexei Starovoitov, Daniel Borkmann, Steven Rostedt
  Cc: KP Singh, Andy Lutomirski, kernel list, the arch/x86 maintainers,
	Josh Poimboeuf, Ingo Molnar

Hi!

I was chatting with kpsingh about BPF trampolines, and I noticed that
it looks like BPF trampolines (as of current bpf-next/master) seem to
be missing synchronization between trampoline code updates and
trampoline execution. Or maybe I'm missing something?

If I understand correctly, trampolines are executed directly from the
fentry placeholders at the start of arbitrary kernel functions, so
they can run without any locks held. So for example, if task A starts
executing a trampoline on entry to sys_open(), then gets preempted in
the middle of the trampoline, and then task B quickly calls
BPF_RAW_TRACEPOINT_OPEN twice, and then task A continues execution,
task A will end up executing the middle of newly-written machine code,
which can probably end up crashing the kernel somehow?

I think that at least to synchronize trampoline text freeing with
concurrent trampoline execution, it is necessary to do something
similar to what the livepatching code does with klp_check_stack(), and
then either use a callback from the scheduler to periodically re-check
tasks that were in the trampoline or let the trampoline tail-call into
a cleanup helper that is part of normal kernel text. And you'd
probably have to gate BPF trampolines on
CONFIG_HAVE_RELIABLE_STACKTRACE.

[Trampoline *updates* could probably be handled more easily if a
trampoline consisted of an immutable header that increments something
like a percpu refcount followed by a mutable body, but given that that
doesn't solve freeing trampolines, I'm not sure if it'd be worth the
effort. Unless you just never free trampoline memory, but that's
probably not a great idea.]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BPF tracing trampoline synchronization between update/freeing and execution?
  2020-01-06 16:39 BPF tracing trampoline synchronization between update/freeing and execution? Jann Horn
@ 2020-01-06 16:56 ` Peter Zijlstra
  2020-01-06 22:29   ` Alexei Starovoitov
  2020-01-07  8:28   ` Petr Mladek
  0 siblings, 2 replies; 5+ messages in thread
From: Peter Zijlstra @ 2020-01-06 16:56 UTC (permalink / raw)
  To: Jann Horn
  Cc: bpf, live-patching, Alexei Starovoitov, Daniel Borkmann,
	Steven Rostedt, KP Singh, Andy Lutomirski, kernel list,
	the arch/x86 maintainers, Josh Poimboeuf, Ingo Molnar

On Mon, Jan 06, 2020 at 05:39:30PM +0100, Jann Horn wrote:
> Hi!
> 
> I was chatting with kpsingh about BPF trampolines, and I noticed that
> it looks like BPF trampolines (as of current bpf-next/master) seem to
> be missing synchronization between trampoline code updates and
> trampoline execution. Or maybe I'm missing something?
> 
> If I understand correctly, trampolines are executed directly from the
> fentry placeholders at the start of arbitrary kernel functions, so
> they can run without any locks held. So for example, if task A starts
> executing a trampoline on entry to sys_open(), then gets preempted in
> the middle of the trampoline, and then task B quickly calls
> BPF_RAW_TRACEPOINT_OPEN twice, and then task A continues execution,
> task A will end up executing the middle of newly-written machine code,
> which can probably end up crashing the kernel somehow?
> 
> I think that at least to synchronize trampoline text freeing with
> concurrent trampoline execution, it is necessary to do something
> similar to what the livepatching code does with klp_check_stack(), and
> then either use a callback from the scheduler to periodically re-check
> tasks that were in the trampoline or let the trampoline tail-call into
> a cleanup helper that is part of normal kernel text. And you'd
> probably have to gate BPF trampolines on
> CONFIG_HAVE_RELIABLE_STACKTRACE.

ftrace uses synchronize_rcu_tasks() to flip between trampolines iirc.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BPF tracing trampoline synchronization between update/freeing and execution?
  2020-01-06 16:56 ` Peter Zijlstra
@ 2020-01-06 22:29   ` Alexei Starovoitov
  2020-01-07  8:28   ` Petr Mladek
  1 sibling, 0 replies; 5+ messages in thread
From: Alexei Starovoitov @ 2020-01-06 22:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jann Horn, bpf, live-patching, Alexei Starovoitov,
	Daniel Borkmann, Steven Rostedt, KP Singh, Andy Lutomirski,
	kernel list, the arch/x86 maintainers, Josh Poimboeuf,
	Ingo Molnar

On Mon, Jan 06, 2020 at 05:56:54PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 06, 2020 at 05:39:30PM +0100, Jann Horn wrote:
> > Hi!
> > 
> > I was chatting with kpsingh about BPF trampolines, and I noticed that
> > it looks like BPF trampolines (as of current bpf-next/master) seem to
> > be missing synchronization between trampoline code updates and
> > trampoline execution. Or maybe I'm missing something?
> > 
> > If I understand correctly, trampolines are executed directly from the
> > fentry placeholders at the start of arbitrary kernel functions, so
> > they can run without any locks held. So for example, if task A starts
> > executing a trampoline on entry to sys_open(), then gets preempted in
> > the middle of the trampoline, and then task B quickly calls
> > BPF_RAW_TRACEPOINT_OPEN twice, and then task A continues execution,
> > task A will end up executing the middle of newly-written machine code,
> > which can probably end up crashing the kernel somehow?
> > 
> > I think that at least to synchronize trampoline text freeing with
> > concurrent trampoline execution, it is necessary to do something
> > similar to what the livepatching code does with klp_check_stack(), and
> > then either use a callback from the scheduler to periodically re-check
> > tasks that were in the trampoline or let the trampoline tail-call into
> > a cleanup helper that is part of normal kernel text. And you'd
> > probably have to gate BPF trampolines on
> > CONFIG_HAVE_RELIABLE_STACKTRACE.
> 
> ftrace uses synchronize_rcu_tasks() to flip between trampolines iirc.

good catch and good suggestion. synchronize_rcu_tasks() is needed here too.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BPF tracing trampoline synchronization between update/freeing and execution?
  2020-01-06 16:56 ` Peter Zijlstra
  2020-01-06 22:29   ` Alexei Starovoitov
@ 2020-01-07  8:28   ` Petr Mladek
  2020-01-07  9:03     ` Petr Mladek
  1 sibling, 1 reply; 5+ messages in thread
From: Petr Mladek @ 2020-01-07  8:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jann Horn, bpf, live-patching, Alexei Starovoitov,
	Daniel Borkmann, Steven Rostedt, KP Singh, Andy Lutomirski,
	kernel list, the arch/x86 maintainers, Josh Poimboeuf,
	Ingo Molnar

On Mon 2020-01-06 17:56:54, Peter Zijlstra wrote:
> On Mon, Jan 06, 2020 at 05:39:30PM +0100, Jann Horn wrote:
> > Hi!
> > 
> > I was chatting with kpsingh about BPF trampolines, and I noticed that
> > it looks like BPF trampolines (as of current bpf-next/master) seem to
> > be missing synchronization between trampoline code updates and
> > trampoline execution. Or maybe I'm missing something?
> > 
> > If I understand correctly, trampolines are executed directly from the
> > fentry placeholders at the start of arbitrary kernel functions, so
> > they can run without any locks held. So for example, if task A starts
> > executing a trampoline on entry to sys_open(), then gets preempted in
> > the middle of the trampoline, and then task B quickly calls
> > BPF_RAW_TRACEPOINT_OPEN twice, and then task A continues execution,
> > task A will end up executing the middle of newly-written machine code,
> > which can probably end up crashing the kernel somehow?
> > 
> > I think that at least to synchronize trampoline text freeing with
> > concurrent trampoline execution, it is necessary to do something
> > similar to what the livepatching code does with klp_check_stack(), and
> > then either use a callback from the scheduler to periodically re-check
> > tasks that were in the trampoline or let the trampoline tail-call into
> > a cleanup helper that is part of normal kernel text. And you'd
> > probably have to gate BPF trampolines on
> > CONFIG_HAVE_RELIABLE_STACKTRACE.
> 
> ftrace uses synchronize_rcu_tasks() to flip between trampolines iirc.

ftrace calls also schedule_on_each_cpu(ftrace_sync) to handle
situations where RCU is not watching, see rcu_is_watching().

The following is called in ftrace_shutdown():

	schedule_on_each_cpu(ftrace_sync);

	if (IS_ENABLED(CONFIG_PREEMPTION))
		synchronize_rcu_tasks();

	arch_ftrace_trampoline_free(ops);

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BPF tracing trampoline synchronization between update/freeing and execution?
  2020-01-07  8:28   ` Petr Mladek
@ 2020-01-07  9:03     ` Petr Mladek
  0 siblings, 0 replies; 5+ messages in thread
From: Petr Mladek @ 2020-01-07  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jann Horn, bpf, live-patching, Alexei Starovoitov,
	Daniel Borkmann, Steven Rostedt, KP Singh, Andy Lutomirski,
	kernel list, the arch/x86 maintainers, Josh Poimboeuf,
	Ingo Molnar

On Tue 2020-01-07 09:28:42, Petr Mladek wrote:
> On Mon 2020-01-06 17:56:54, Peter Zijlstra wrote:
> > On Mon, Jan 06, 2020 at 05:39:30PM +0100, Jann Horn wrote:
> > > Hi!
> > > 
> > > I was chatting with kpsingh about BPF trampolines, and I noticed that
> > > it looks like BPF trampolines (as of current bpf-next/master) seem to
> > > be missing synchronization between trampoline code updates and
> > > trampoline execution. Or maybe I'm missing something?
> > > 
> > > If I understand correctly, trampolines are executed directly from the
> > > fentry placeholders at the start of arbitrary kernel functions, so
> > > they can run without any locks held. So for example, if task A starts
> > > executing a trampoline on entry to sys_open(), then gets preempted in
> > > the middle of the trampoline, and then task B quickly calls
> > > BPF_RAW_TRACEPOINT_OPEN twice, and then task A continues execution,
> > > task A will end up executing the middle of newly-written machine code,
> > > which can probably end up crashing the kernel somehow?
> > > 
> > > I think that at least to synchronize trampoline text freeing with
> > > concurrent trampoline execution, it is necessary to do something
> > > similar to what the livepatching code does with klp_check_stack(), and
> > > then either use a callback from the scheduler to periodically re-check
> > > tasks that were in the trampoline or let the trampoline tail-call into
> > > a cleanup helper that is part of normal kernel text. And you'd
> > > probably have to gate BPF trampolines on
> > > CONFIG_HAVE_RELIABLE_STACKTRACE.
> > 
> > ftrace uses synchronize_rcu_tasks() to flip between trampolines iirc.
> 
> ftrace calls also schedule_on_each_cpu(ftrace_sync) to handle
> situations where RCU is not watching, see rcu_is_watching().
> 
> The following is called in ftrace_shutdown():
> 
> 	schedule_on_each_cpu(ftrace_sync);
> 
> 	if (IS_ENABLED(CONFIG_PREEMPTION))
> 		synchronize_rcu_tasks();
> 
> 	arch_ftrace_trampoline_free(ops);

Just to be sure. IMHO, the above should be enough to decide when
a ftrace-like trampoline could be freed. But it is not enough
to decide whether any task is still in the middle of the BPF
program that the trampoline jumped to.

You will likely need something more complicated to decide when it is safe
to free the BPF program itself. The stack check probably won't help.
I guess that the stack will be marked as unsafe in BPF program because
there will be no data for ORC unwinder. But some simple reference
counting might do the job.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-01-07  9:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-06 16:39 BPF tracing trampoline synchronization between update/freeing and execution? Jann Horn
2020-01-06 16:56 ` Peter Zijlstra
2020-01-06 22:29   ` Alexei Starovoitov
2020-01-07  8:28   ` Petr Mladek
2020-01-07  9:03     ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).