All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
	Nadav Amit <nadav.amit@gmail.com>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Song Liu <songliubraving@fb.com>,
	Masami Hiramatsu <mhiramat@kernel.org>
Subject: Re: [PATCH 3/3] x86/ftrace: Use text_poke()
Date: Tue, 22 Oct 2019 14:10:21 -0400	[thread overview]
Message-ID: <20191022141021.2c4496c2@gandalf.local.home> (raw)
In-Reply-To: <20191022175052.frjzlnjjfwwfov64@ast-mbp.dhcp.thefacebook.com>

On Tue, 22 Oct 2019 10:50:56 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> > +static void my_hijack_func(unsigned long ip, unsigned long pip,
> > +			   struct ftrace_ops *ops, struct pt_regs *regs)  
> 
> 1.
> To pass regs into the callback ftrace_regs_caller() has huge amount
> of stores to do. Saving selector registers is not fast. pushf/popf are even slower.
> ~200 bytes of stack is being used for save/restore.
> This is pure overhead that bpf execution cannot afford.
> bpf is different from traditional ftrace and other tracing, since
> it's often active 24/7. Every nanosecond counts.

Live patching is the same as what you have. If not even more critical.

Note, it would be easy to also implement a "just give me IP regs" flag,
or have that be the default if IPMODIFY is set and REGS is not.


> So for bpf I'm generating assembler trampoline tailored to specific kernel
> function that has its fentry nop replaced with a call to that trampoline.
> Instead of 20+ register save I save only arguments of that kernel function.
> For example the trampoline that attaches to kfree_skb() will save only two registers
> (rbp and rdi on x86) and will use 16 bytes of stack.
> 
> 2.
> The common ftrace callback api allows ftrace infra to use generic ftrace_ops_list_func()
> that works for all ftracers, but it doesn't scale.

That only happens if you have more than one callback to a same
function. Otherwise you get a dedicated trampoline.


> We see different teams writing bpf services that attach to the same function.
> In addition there are 30+ kprobes active in other places, so for every
> fentry the ftrace_ops_list_func() walks long linked list and does hash
> lookup for each. That's not efficient and we see this slowdown in practice.
> Because of unique trampoline for every kernel function single
> generic list caller is not possible.
> Instead generated unique trampoline handles all attached bpf program
> for this particular kernel function in a sequence of calls.

Why not have a single ftrace_ops() that calls this utility and do the
multiplexing then?

> No link lists to walk, no hash tables to lookup.
> All overhead is gone.
> 
> 3.
> The existing kprobe bpf progs are using pt_regs to read arguments. Due to

That was done because kprobes in general work off of int3. And the
saving of pt_regs was to reuse the code and allow kprobes to work both
with or without a ftrace helper.

> that ugliness all of them are written for single architecture (x86-64).
> Porting them to arm64 is not that hard, but porting to 32-bit arch is close
> to impossible. With custom generated trampoline we'll have bpf progs that
> work as-is on all archs. raw_tracepoint bpf progs already demonstrated
> that such portability is possible. This new kprobe++ progs will be similar.
> 
> 4.
> Due to uniqueness of bpf trampoline sharing trampoline between ftracers
> and bpf progs is not possible, so users would have to choose whether to
> ftrace that particular kernel function or attach bpf to it.
> Attach is not going to stomp on each other. I'm reusing ftrace_make_call/nop
> approach that checks that its a 'nop' being replaced.

What about the approach I showed here? Just register a ftrace_ops with
ip modify set, and then call you unique trampoline directly.

It would keep the modification all in one place instead of having
multiple implementations of it. We can make ftrace call your trampoline
just like it was called directly, without writing a whole new
infrastructure.

-- Steve


> 
> There probably will be some gotchas and unforeseen issues, since prototype
> is very rough and not in reviewable form yet. Will share when it's ready.


  reply	other threads:[~2019-10-22 18:10 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-27 18:06 [PATCH 0/3] Rewrite x86/ftrace to use text_poke() Peter Zijlstra
2019-08-27 18:06 ` [PATCH 1/3] x86/alternatives: Teach text_poke_bp() to emulate instructions Peter Zijlstra
2019-10-03  5:00   ` Masami Hiramatsu
2019-10-03  8:27     ` Peter Zijlstra
2019-10-03 11:01       ` Peter Zijlstra
2019-10-03 12:32         ` Peter Zijlstra
2019-10-04 13:45         ` Masami Hiramatsu
2019-10-07  8:05           ` Peter Zijlstra
2019-10-09 13:07           ` x86/kprobes bug? (was: [PATCH 1/3] x86/alternatives: Teach text_poke_bp() to emulate instructions) Peter Zijlstra
2019-10-09 13:26             ` Peter Zijlstra
2019-10-09 13:28               ` Peter Zijlstra
2019-10-09 14:26             ` Mathieu Desnoyers
2019-10-17 19:59               ` Peter Zijlstra
2019-10-03 13:05       ` [PATCH 1/3] x86/alternatives: Teach text_poke_bp() to emulate instructions Peter Zijlstra
2019-08-27 18:06 ` [PATCH 2/3] x86/alternatives,jump_label: Provide better text_poke() batching interface Peter Zijlstra
2019-10-02 16:34   ` Daniel Bristot de Oliveira
2019-10-03  5:50   ` Masami Hiramatsu
2019-08-27 18:06 ` [PATCH 3/3] x86/ftrace: Use text_poke() Peter Zijlstra
2019-10-02 16:35   ` Daniel Bristot de Oliveira
2019-10-02 18:21     ` Peter Zijlstra
2019-10-03 22:10       ` Steven Rostedt
2019-10-04  8:10         ` Daniel Bristot de Oliveira
2019-10-04 13:40           ` Steven Rostedt
2019-10-04 14:44             ` Daniel Bristot de Oliveira
2019-10-04 15:13               ` Steven Rostedt
2019-10-07  8:08           ` Peter Zijlstra
2019-10-11  7:01           ` Peter Zijlstra
2019-10-11  7:37             ` Daniel Bristot de Oliveira
2019-10-11 10:57               ` Peter Zijlstra
2019-10-11 13:11               ` Steven Rostedt
2019-10-04 11:22         ` Peter Zijlstra
2019-10-04 13:42           ` Steven Rostedt
2019-10-22  0:36             ` Alexei Starovoitov
2019-10-22  0:43               ` Steven Rostedt
2019-10-22  3:10                 ` Alexei Starovoitov
2019-10-22  3:16                   ` Steven Rostedt
2019-10-22  3:19                     ` Steven Rostedt
2019-10-22  4:05                       ` Alexei Starovoitov
2019-10-22 11:19                         ` Steven Rostedt
2019-10-22 13:44                           ` Steven Rostedt
2019-10-22 17:50                             ` Alexei Starovoitov
2019-10-22 18:10                               ` Steven Rostedt [this message]
2019-10-22 20:46                                 ` Alexei Starovoitov
2019-10-22 21:04                                   ` Steven Rostedt
2019-10-22 21:58                                     ` Alexei Starovoitov
2019-10-22 22:17                                       ` Steven Rostedt
2019-10-23  2:02                                         ` Steven Rostedt
2019-10-22 22:45                                       ` Andy Lutomirski
2019-10-22 23:21                                         ` Steven Rostedt
2019-10-22 23:49                                         ` Alexei Starovoitov
2019-10-23  4:20                                           ` Andy Lutomirski
2019-10-23  9:02                                             ` Peter Zijlstra
2019-10-23 16:23                                       ` Steven Rostedt
2019-10-23 17:42                                         ` Steven Rostedt
2019-10-23 19:34                                         ` Alexei Starovoitov
2019-10-23 20:08                                           ` Steven Rostedt
2019-10-23 22:36                                             ` Alexei Starovoitov
2019-10-22  3:55                     ` Alexei Starovoitov
2019-10-03  5:52     ` Masami Hiramatsu
2019-08-28  7:22 ` [PATCH 0/3] Rewrite x86/ftrace to use text_poke() Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191022141021.2c4496c2@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bristot@redhat.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=peterz@infradead.org \
    --cc=songliubraving@fb.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.