All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
	Nadav Amit <nadav.amit@gmail.com>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Song Liu <songliubraving@fb.com>,
	Masami Hiramatsu <mhiramat@kernel.org>
Subject: Re: [PATCH 3/3] x86/ftrace: Use text_poke()
Date: Wed, 23 Oct 2019 12:34:43 -0700	[thread overview]
Message-ID: <20191023193442.35lhhrqnyn3bfwpq@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <20191023122307.756b4978@gandalf.local.home>

On Wed, Oct 23, 2019 at 12:23:06PM -0400, Steven Rostedt wrote:
> On Tue, 22 Oct 2019 14:58:43 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > Neither of two statements are true. The per-function generated trampoline
> > I'm talking about is bpf specific. For a function with two arguments it's just:
> > push rbp 
> > mov rbp, rsp
> > push rdi
> > push rsi
> > lea  rdi,[rbp-0x10]
> > call jited_bpf_prog
> > pop rsi
> > pop rdi
> > leave
> > ret
> > 
> > fentry's nop is replaced with call to the above.
> > That's it.
> > kprobe and live patching has no use out of it.
> > 
> 
> Below is a patch that allows you to do this, and you don't need to
> worry about patching the nops. And it also allows to you hook directly
> to any function and still allow kprobes and tracing on those same
> functions (as long as they don't modify the ip, but in the future, we
> may be able to allow that too!). And this code does not depend on
> Peter's code either.
> 
> All you need to do is:
> 
> 	register_ftrace_direct((unsigned long)func_you_want_to_trace,
> 			       (unsigned long)your_trampoline);
> 
> 
> I added to trace-event-samples.c:
> 
> void my_direct_func(raw_spinlock_t *lock)
> {
> 	trace_printk("taking %p\n", lock);
> }
> 
> extern void my_tramp(void *);
> 
> asm (
> "       .pushsection    .text, \"ax\", @progbits\n"
> "   my_tramp:"
> #if 1
> "       pushq %rbp\n"
> "	movq %rsp, %rbp\n"
> "	pushq %rdi\n"
> "	call my_direct_func\n"
> "	popq %rdi\n"
> "	leave\n"
> #endif
> "	ret\n"
> "       .popsection\n"
> );
> 
> 
> (the #if was for testing purposes)
> 
> And then in the module load and unload:
> 
> 	ret = register_ftrace_direct((unsigned long)do_raw_spin_lock,
> 				     (unsigned long)my_tramp);
> 
> 	unregister_ftrace_direct((unsigned long)do_raw_spin_lock,
> 				 (unsigned long)my_tramp);
> 
> respectively.
> 
> And what this does is if there's only a single callback to the
> registered function, it changes the nop in the function to call your
> trampoline directly (just like you want this to do!). But if we add
> another callback, a direct_ops ftrace_ops gets added to the list of the
> functions to go through, and this will set up the code to call your
> trampoline after it calls all the other callbacks.
> 
> The called trampoline will be called as if it was called directly from
> the nop.
> 
> OK, I wrote this up quickly, and it has some bugs, but nothing that
> can't be straighten out (specifically, the checks fail if you add a
> function trace to one of the direct callbacks, but this can be dealt
> with).
> 
> Note, the work needed to port this to other archs is rather minimal
> (just need to tweak the ftrace_regs_caller and have a way to pass back
> the call address via pt_regs that is not saved).
> 
> Alexei,
> 
> Would this work for you?

Yes!
Looks great. More comments below.

> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 6adaf18b3365..de3372bd08ae 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -159,6 +159,7 @@ config X86
>  	select HAVE_DYNAMIC_FTRACE
>  	select HAVE_DYNAMIC_FTRACE_WITH_REGS
>  	select HAVE_DYNAMIC_FTRACE_WITH_IPMODIFY
> +	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>  	select HAVE_EBPF_JIT
>  	select HAVE_EFFICIENT_UNALIGNED_ACCESS
>  	select HAVE_EISA
> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> index c38a66661576..34da1e424391 100644
> --- a/arch/x86/include/asm/ftrace.h
> +++ b/arch/x86/include/asm/ftrace.h
> @@ -28,6 +28,12 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
>  	return addr;
>  }
>  
> +static inline void ftrace_set_call_func(struct pt_regs *regs, unsigned long addr)
> +{
> +	/* Emulate a call */
> +	regs->orig_ax = addr;

This probably needs a longer comment :)

> +	.if \make_call
> +	movq ORIG_RAX(%rsp), %rax
> +	movq %rax, MCOUNT_REG_SIZE-8(%rsp)

reading asm helps.

> +config HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> +	bool
> +
>  config HAVE_FTRACE_MCOUNT_RECORD
>  	bool
>  	help
> @@ -565,6 +568,11 @@ config DYNAMIC_FTRACE_WITH_IPMODIFY_ONLY
>  	depends on DYNAMIC_FTRACE
>  	depends on HAVE_DYNAMIC_FTRACE_WITH_IPMODIFY
>  
> +config DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> +	def_bool y
> +	depends on DYNAMIC_FTRACE
> +	depends on HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS

It seems to me that it's a bit of overkill to add new config knob
for every ftrace feature.
HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS (that arch defined) would
be enough to check and return error in register_ftrace_direct()
right?

> -static struct ftrace_hash *
> -__ftrace_hash_move(struct ftrace_hash *src)
> +static void transfer_hash(struct ftrace_hash *dst, struct ftrace_hash *src)
>  {
>  	struct ftrace_func_entry *entry;
> -	struct hlist_node *tn;
>  	struct hlist_head *hhd;
> +	struct hlist_node *tn;
> +	int size;
> +	int i;
> +
> +	dst->flags = src->flags;
> +
> +	size = 1 << src->size_bits;
> +	for (i = 0; i < size; i++) {
> +		hhd = &src->buckets[i];
> +		hlist_for_each_entry_safe(entry, tn, hhd, hlist) {
> +			remove_hash_entry(src, entry);
> +			__add_hash_entry(dst, entry);

I don't quite follow why this is needed.
I thought alloc_and_copy_ftrace_hash() can already handle it.
If that is just unrelated cleanup then sure. Looks good.

> +struct ftrace_ops direct_ops = {
> +	.func		= call_direct_funcs,
> +	.flags		= FTRACE_OPS_FL_IPMODIFY | FTRACE_OPS_FL_RECURSION_SAFE
> +#if 1
> +					| FTRACE_OPS_FL_DIRECT
> +#endif
> +#ifndef CONFIG_DYNAMIC_FTRACE_WITH_IPMODIFY_ONLY
> +					| FTRACE_OPS_FL_SAVE_REGS
> +#endif

With FL_DIRECT the CONFIG_DYNAMIC_FTRACE_WITH_IPMODIFY_ONLY won't be needed, right ?
At least not for bpf use case.
Do you see livepatching using it or switching to FL_DIRECT too?

> +	ret = -ENOMEM;
> +	if (ftrace_hash_empty(direct_functions) ||
> +	    direct_functions->count > 2 * (1 << direct_functions->size_bits)) {
> +		struct ftrace_hash *new_hash;
> +		int size = ftrace_hash_empty(direct_functions) ? 0 :
> +			direct_functions->count + 1;
> +		int bits;
> +
> +		if (size < 32)
> +			size = 32;
> +
> +		for (size /= 2; size; size >>= 1)
> +			bits++;
> +
> +		/* Don't allocate too much */
> +		if (bits > FTRACE_HASH_MAX_BITS)
> +			bits = FTRACE_HASH_MAX_BITS;
> +
> +		new_hash = alloc_ftrace_hash(bits);
> +		if (!new_hash)
> +			goto out_unlock;
> +
> +		transfer_hash(new_hash, direct_functions);
> +		free_ftrace_hash(direct_functions);
> +		direct_functions = new_hash;

That's probably racy, no?
ftrace_get_addr_new() is not holding direct_mutex that
protects direct_functions.

> +	if (!ret && !(direct_ops.flags & FTRACE_OPS_FL_ENABLED))
> +		ret = register_ftrace_function(&direct_ops);

Having single direct_ops is nice.

> @@ -2370,6 +2542,10 @@ unsigned long ftrace_get_addr_new(struct dyn_ftrace *rec)
>  {
>  	struct ftrace_ops *ops;
>  
> +	if ((rec->flags & FTRACE_FL_DIRECT) &&
> +	    (ftrace_rec_count(rec) == 1))
> +		return find_rec_direct(rec->ip);
> +

I've started playing with this function as well to
implement 2nd nop approach I mentioned earlier.
I'm going to abandon it, since your approach is better.
It allows not only bpf, but anyone else to register direct.

I have one more question/request.
Looks like ftrace can be turned off with sysctl.
Which means that a person or a script can accidently turn it off
and all existing kprobe+bpf stuff that is ftrace based will
be silently switched off.
User services will be impacted and will make people unhappy.
I'm not sure how to solve the existing situation,
but for FL_DIRECT can you add a check that if particular
nop is modified to do direct call then the only interface to
turn it off is to call unregister_ftrace_direct().
There should no side channel to kill it.
ftrace_disable and ftrace_kill make me nervous too.
Fast forward a year and imagine few host critical bpf progs
are running in production and relying on FL_DIRECT.
Now somebody decided to test new ftrace feature and
it hit one of FTRACE_WARN_ON().
That will shutdown the whole ftrace and bpf progs
will stop executing. I'd like to avoid that.
May be I misread the code?


  parent reply	other threads:[~2019-10-23 19:42 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-27 18:06 [PATCH 0/3] Rewrite x86/ftrace to use text_poke() Peter Zijlstra
2019-08-27 18:06 ` [PATCH 1/3] x86/alternatives: Teach text_poke_bp() to emulate instructions Peter Zijlstra
2019-10-03  5:00   ` Masami Hiramatsu
2019-10-03  8:27     ` Peter Zijlstra
2019-10-03 11:01       ` Peter Zijlstra
2019-10-03 12:32         ` Peter Zijlstra
2019-10-04 13:45         ` Masami Hiramatsu
2019-10-07  8:05           ` Peter Zijlstra
2019-10-09 13:07           ` x86/kprobes bug? (was: [PATCH 1/3] x86/alternatives: Teach text_poke_bp() to emulate instructions) Peter Zijlstra
2019-10-09 13:26             ` Peter Zijlstra
2019-10-09 13:28               ` Peter Zijlstra
2019-10-09 14:26             ` Mathieu Desnoyers
2019-10-17 19:59               ` Peter Zijlstra
2019-10-03 13:05       ` [PATCH 1/3] x86/alternatives: Teach text_poke_bp() to emulate instructions Peter Zijlstra
2019-08-27 18:06 ` [PATCH 2/3] x86/alternatives,jump_label: Provide better text_poke() batching interface Peter Zijlstra
2019-10-02 16:34   ` Daniel Bristot de Oliveira
2019-10-03  5:50   ` Masami Hiramatsu
2019-08-27 18:06 ` [PATCH 3/3] x86/ftrace: Use text_poke() Peter Zijlstra
2019-10-02 16:35   ` Daniel Bristot de Oliveira
2019-10-02 18:21     ` Peter Zijlstra
2019-10-03 22:10       ` Steven Rostedt
2019-10-04  8:10         ` Daniel Bristot de Oliveira
2019-10-04 13:40           ` Steven Rostedt
2019-10-04 14:44             ` Daniel Bristot de Oliveira
2019-10-04 15:13               ` Steven Rostedt
2019-10-07  8:08           ` Peter Zijlstra
2019-10-11  7:01           ` Peter Zijlstra
2019-10-11  7:37             ` Daniel Bristot de Oliveira
2019-10-11 10:57               ` Peter Zijlstra
2019-10-11 13:11               ` Steven Rostedt
2019-10-04 11:22         ` Peter Zijlstra
2019-10-04 13:42           ` Steven Rostedt
2019-10-22  0:36             ` Alexei Starovoitov
2019-10-22  0:43               ` Steven Rostedt
2019-10-22  3:10                 ` Alexei Starovoitov
2019-10-22  3:16                   ` Steven Rostedt
2019-10-22  3:19                     ` Steven Rostedt
2019-10-22  4:05                       ` Alexei Starovoitov
2019-10-22 11:19                         ` Steven Rostedt
2019-10-22 13:44                           ` Steven Rostedt
2019-10-22 17:50                             ` Alexei Starovoitov
2019-10-22 18:10                               ` Steven Rostedt
2019-10-22 20:46                                 ` Alexei Starovoitov
2019-10-22 21:04                                   ` Steven Rostedt
2019-10-22 21:58                                     ` Alexei Starovoitov
2019-10-22 22:17                                       ` Steven Rostedt
2019-10-23  2:02                                         ` Steven Rostedt
2019-10-22 22:45                                       ` Andy Lutomirski
2019-10-22 23:21                                         ` Steven Rostedt
2019-10-22 23:49                                         ` Alexei Starovoitov
2019-10-23  4:20                                           ` Andy Lutomirski
2019-10-23  9:02                                             ` Peter Zijlstra
2019-10-23 16:23                                       ` Steven Rostedt
2019-10-23 17:42                                         ` Steven Rostedt
2019-10-23 19:34                                         ` Alexei Starovoitov [this message]
2019-10-23 20:08                                           ` Steven Rostedt
2019-10-23 22:36                                             ` Alexei Starovoitov
2019-10-22  3:55                     ` Alexei Starovoitov
2019-10-03  5:52     ` Masami Hiramatsu
2019-08-28  7:22 ` [PATCH 0/3] Rewrite x86/ftrace to use text_poke() Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191023193442.35lhhrqnyn3bfwpq@ast-mbp.dhcp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=bristot@redhat.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=songliubraving@fb.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.