All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
	Nadav Amit <nadav.amit@gmail.com>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Song Liu <songliubraving@fb.com>,
	Masami Hiramatsu <mhiramat@kernel.org>
Subject: Re: [PATCH 3/3] x86/ftrace: Use text_poke()
Date: Tue, 22 Oct 2019 16:49:22 -0700	[thread overview]
Message-ID: <20191022234921.n5nplxlyq25mksxg@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <7364B113-DD65-423D-BED3-FF90C4DF8334@amacapital.net>

On Tue, Oct 22, 2019 at 03:45:26PM -0700, Andy Lutomirski wrote:
> 
> 
> >> On Oct 22, 2019, at 2:58 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >> 
> >> On Tue, Oct 22, 2019 at 05:04:30PM -0400, Steven Rostedt wrote:
> >> I gave a solution for this. And that is to add another flag to allow
> >> for just the minimum to change the ip. And we can even add another flag
> >> to allow for changing the stack if needed (to emulate a call with the
> >> same parameters).
> > 
> > your solution is to reduce the overhead.
> > my solution is to remove it competely. See the difference?
> > 
> >> By doing this work, live kernel patching will also benefit. Because it
> >> is also dealing with the unnecessary overhead of saving regs.
> >> And we could possibly even have kprobes benefit from this if a kprobe
> >> doesn't need full regs.
> > 
> > Neither of two statements are true. The per-function generated trampoline
> > I'm talking about is bpf specific. For a function with two arguments it's just:
> > push rbp 
> > mov rbp, rsp
> > push rdi
> > push rsi
> > lea  rdi,[rbp-0x10]
> > call jited_bpf_prog
> > pop rsi
> > pop rdi
> > leave
> > ret
> 
> Why are you saving rsi?  You said upthread that you’re saving the args, but rsi is already available in rsi.

because rsi is caller saved. The above example is for probing something
like tcp_set_state(struct sock *sk, int state) that everyone used to
kprobe until we got a tracepoint there.
The main bpf prog has only one argument R1 == rdi on x86,
but it's allowed to clobber all caller saved regs.
Just like x86 function that accepts one argument in rdi can clobber rsi and others.
So it's essential to save 'sk' and 'state' for tcp_set_state()
to continue as nothing happened.

>  But I’m wondering whether the bpf jitted code could just directly access the frame instead of indirecting through a register.

That's an excellent question!
We've debated a ton whether to extend main prog from R1 to all R1-R5
like bpf subprograms allow. The problem is a lot of existing infra
assume single R1==ctx. Passing 6th argument is not defined either.
But it's nice to see all arguments of the kernel function.
Also bpf is 64-bit ISA. Even when it's running on 32-bit arch.
Just taking values from registers doesn't work there.
Whereas when args are indirectly passed as a bunch of u64s in the stack
the bpf prog becomes portable across architectures
(not 100% of course, but close).

> Is it entirely specific to the probed function? 

yes. It is specific to the probed function. The verifier makes sure
that only first two arguments are accessed as read-only
via *(u64*)(r1 + 0) and *(u64*)(r1 + 8)
But the program is allowed to use r2-r5 without saving them.
r6-r10 are saved in implicit program prologue.

> In any event, I think you can’t *just* use text_poke.  Something needs to coordinate to ensure that, if you bpf trace an already-kprobed function, the right thing happens.

Right. Not _just_. I'm looking at Peter's patches and as a minimum I need to grab
text_mutex and make sure it's nop being replaced.

> FWIW, if you are going to use a trampoline like this, consider using r11 for the caller frame instead of rsi.

r11 can be used by jited code that doing divide and multiply.
It's possible to refactor that code and free it up.
I prefer to save such micro-optimization for later.
Dealing with interpreter is also a pain to the point I'm considering
to avoid supporting it for these new things.
Folks should be using CONFIG_BPF_JIT_ALWAYS_ON anyway.


  parent reply	other threads:[~2019-10-22 23:49 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-27 18:06 [PATCH 0/3] Rewrite x86/ftrace to use text_poke() Peter Zijlstra
2019-08-27 18:06 ` [PATCH 1/3] x86/alternatives: Teach text_poke_bp() to emulate instructions Peter Zijlstra
2019-10-03  5:00   ` Masami Hiramatsu
2019-10-03  8:27     ` Peter Zijlstra
2019-10-03 11:01       ` Peter Zijlstra
2019-10-03 12:32         ` Peter Zijlstra
2019-10-04 13:45         ` Masami Hiramatsu
2019-10-07  8:05           ` Peter Zijlstra
2019-10-09 13:07           ` x86/kprobes bug? (was: [PATCH 1/3] x86/alternatives: Teach text_poke_bp() to emulate instructions) Peter Zijlstra
2019-10-09 13:26             ` Peter Zijlstra
2019-10-09 13:28               ` Peter Zijlstra
2019-10-09 14:26             ` Mathieu Desnoyers
2019-10-17 19:59               ` Peter Zijlstra
2019-10-03 13:05       ` [PATCH 1/3] x86/alternatives: Teach text_poke_bp() to emulate instructions Peter Zijlstra
2019-08-27 18:06 ` [PATCH 2/3] x86/alternatives,jump_label: Provide better text_poke() batching interface Peter Zijlstra
2019-10-02 16:34   ` Daniel Bristot de Oliveira
2019-10-03  5:50   ` Masami Hiramatsu
2019-08-27 18:06 ` [PATCH 3/3] x86/ftrace: Use text_poke() Peter Zijlstra
2019-10-02 16:35   ` Daniel Bristot de Oliveira
2019-10-02 18:21     ` Peter Zijlstra
2019-10-03 22:10       ` Steven Rostedt
2019-10-04  8:10         ` Daniel Bristot de Oliveira
2019-10-04 13:40           ` Steven Rostedt
2019-10-04 14:44             ` Daniel Bristot de Oliveira
2019-10-04 15:13               ` Steven Rostedt
2019-10-07  8:08           ` Peter Zijlstra
2019-10-11  7:01           ` Peter Zijlstra
2019-10-11  7:37             ` Daniel Bristot de Oliveira
2019-10-11 10:57               ` Peter Zijlstra
2019-10-11 13:11               ` Steven Rostedt
2019-10-04 11:22         ` Peter Zijlstra
2019-10-04 13:42           ` Steven Rostedt
2019-10-22  0:36             ` Alexei Starovoitov
2019-10-22  0:43               ` Steven Rostedt
2019-10-22  3:10                 ` Alexei Starovoitov
2019-10-22  3:16                   ` Steven Rostedt
2019-10-22  3:19                     ` Steven Rostedt
2019-10-22  4:05                       ` Alexei Starovoitov
2019-10-22 11:19                         ` Steven Rostedt
2019-10-22 13:44                           ` Steven Rostedt
2019-10-22 17:50                             ` Alexei Starovoitov
2019-10-22 18:10                               ` Steven Rostedt
2019-10-22 20:46                                 ` Alexei Starovoitov
2019-10-22 21:04                                   ` Steven Rostedt
2019-10-22 21:58                                     ` Alexei Starovoitov
2019-10-22 22:17                                       ` Steven Rostedt
2019-10-23  2:02                                         ` Steven Rostedt
2019-10-22 22:45                                       ` Andy Lutomirski
2019-10-22 23:21                                         ` Steven Rostedt
2019-10-22 23:49                                         ` Alexei Starovoitov [this message]
2019-10-23  4:20                                           ` Andy Lutomirski
2019-10-23  9:02                                             ` Peter Zijlstra
2019-10-23 16:23                                       ` Steven Rostedt
2019-10-23 17:42                                         ` Steven Rostedt
2019-10-23 19:34                                         ` Alexei Starovoitov
2019-10-23 20:08                                           ` Steven Rostedt
2019-10-23 22:36                                             ` Alexei Starovoitov
2019-10-22  3:55                     ` Alexei Starovoitov
2019-10-03  5:52     ` Masami Hiramatsu
2019-08-28  7:22 ` [PATCH 0/3] Rewrite x86/ftrace to use text_poke() Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191022234921.n5nplxlyq25mksxg@ast-mbp.dhcp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=bristot@redhat.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=songliubraving@fb.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.