linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Masami Hiramatsu <mhiramat@kernel.org>
To: Daniel Xu <dxu@dxuuu.xyz>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@kernel.org>, X86 ML <x86@kernel.org>,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	kuba@kernel.org, mingo@redhat.com, ast@kernel.org,
	tglx@linutronix.de, kernel-team@fb.com, yhs@fb.com,
	Josh Poimboeuf <jpoimboe@redhat.com>
Subject: Re: [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes
Date: Mon, 8 Mar 2021 11:52:10 +0900	[thread overview]
Message-ID: <20210308115210.732f2c42bf347c15fbb2a828@kernel.org> (raw)
In-Reply-To: <20210307212333.7jqmdnahoohpxabn@maharaja.localdomain>

On Sun, 7 Mar 2021 13:23:33 -0800
Daniel Xu <dxu@dxuuu.xyz> wrote:

> > kretprobe replaces the real return address with kretprobe_trampoline
> > and kretprobe_trampoline *calls* trampoline_handler (this part depends
> > on architecture implementation).
> > Thus, if kretprobe_trampoline has no stack frame information, ORC may
> > fails at the first kretprobe_trampoline+0x25, that is different from
> > the kretprobe_trampoline+0, so the hack doesn't work.
> 
> I'm not sure I follow 100% what you're saying, but assuming you're
> asking why bpftrace fails at `kretprobe_trampoline+0` instead of
> `kretprobe_trampoline+0x25`:
> 
> `regs` is set to &kretprobe_trampoline:
> 
>     regs->ip = (unsigned long)&kretprobe_trampoline;

Ah, OK. bpftrace does the adjustment.

> Then the kretprobe event is dispatched like this:
> 
>     kretprobe_trampoline_handler
>       __kretprobe_trampoline_handler
>         rp->handler // ie kernel/trace/trace_kprobe.c:kretprobe_dispatcher
>           kretprobe_perf_func
>             trace_bpf_call(.., regs)
>               BPF_PROG_RUN_ARRAY_CHECK
>                 bpf_get_stackid(regs, .., ..) // in bpftrace prog 
> 
> And then `bpf_get_stackid` unwinds the stack via:
> 
>     bpf_get_stackid
>       get_perf_callchain(regs, ...)
>         perf_callchain_kernel(.., regs)
>           perf_callchain_store(.., regs->ip) // !!! first unwound entry
>           unwind_start
>           unwind_next_frame
> 
> In summary: unwinding via BPF begins at regs->ip, which
> `trampoline_handler` sets to `&kretprobe_trampoline+0x0`.

OK, maybe you are using stack_trace_save_regs() with pt_regs instead of
stack_trace_save(). this means it started from regs at saved by
kretprobe always.

In the ftrace, we are using stack_trace_save() which is based on
the current stack, this means stack unwinder tracks back the stack of
kretprobe itself at first. So it saw the kretprobe_trampoline+0x25 
(return address of the trampoline_handler) and stop unwinding because
the call frame information (ORC information) and the return address
are not there.

This issue is not only the ftrace, but also backtrace in interrupt
handler and kretprobe handler.

> > Hmm, how the other inline-asm function makes its stackframe info?
> > If I get the kretprobe_trampoline+0, I can fix it up.
> 
> So I think I misunderstood the mechanics before. I think `call
> trampoline_handler` does set up a new frame. My current guess is that
> ftrace doesn't thread through `regs` like the bpf code path does. I'm
> not very familiar with ftrace internals so I haven't looked.

Yes, that's right. Since I made a kretprobe event on top of the ftrace
event framework, it doesn't pass the regs to the event trigger.


> > > The only way I can think of to fix this issue is to make the ORC
> > > unwinder aware of kretprobe (ie the patch I sent earlier). I'm hoping
> > > you have another idea if my patch isn't acceptable.
> > 
> > OK, anyway, your patch doesn't care the case that the multiple functions
> > are probed by kretprobes. In that case, you'll see several entries are
> > replaced by the kretprobe_trampoline. To find it correctly, you have
> > to pass a state-holder (@cur of the kretprobe_find_ret_addr())
> > to the fixup entries.
> 
> I'll see if I can figure something out tomorrow.

To help your understanding, let me explain.

If we have a code here

caller_func:
0x00 add sp, 0x20	/* 0x20 bytes stack frame allocated */
...
0x10 call target_func
0x15 ... /* return address */

On the stack in the entry of target_func, we have

[stack]
0x0e0 caller_func+0x15
... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
0x100 /* caller_func return address */

And when we put a kretprobe on the target_func, the stack will be

[stack]
0x0e0 kretprobe_trampoline
... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
0x100 /* caller_func return address */

* "caller_func+0x15" is saved in current->kretprobe_instances.first.

When returning from the target_func, call consumed the 0x0e0 and
jump to kretprobe_trampoline. Let's see the assembler code.

        ".text\n"
        ".global kretprobe_trampoline\n"
        ".type kretprobe_trampoline, @function\n"
        "kretprobe_trampoline:\n"
        /* We don't bother saving the ss register */
        "       pushq %rsp\n"
        "       pushfq\n"
        SAVE_REGS_STRING
        "       movq %rsp, %rdi\n"
        "       call trampoline_handler\n"
        /* Replace saved sp with true return address. */
        "       movq %rax, 19*8(%rsp)\n"
        RESTORE_REGS_STRING
        "       popfq\n"
        "       ret\n"

When the entry of trampoline_handler, stack is like this;

[stack]
0x040 kretprobe_trampoline+0x25
0x048 r15
...     /* pt_regs */
0x0d8 flags
0x0e0 rsp (=0x0e0)
... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
0x100 /* caller_func return address */

And after returned from trampoline_handler, "movq" changes the
stack like this.

[stack]
0x040 kretprobe_trampoline+0x25
0x048 r15
...     /* pt_regs */
0x0d8 flags
0x0e0 caller_func+0x15
... /* 0x20 bytes = 4 entries  are stack frame of caller_func */
0x100 /* caller_func return address */


So at the kretprobe handler, we have 2 issues.
1) the return address (caller_func+0x15) is not on the stack.
   this can be solved by searching from current->kretprobe_instances.
2) the stack frame size of kretprobe_trampoline is unknown
   Since the stackframe is fixed, the fixed number (0x98) can be used.

However, those solutions are only for the kretprobe handler. The stacktrace
from interrupt handler hit in the kretprobe_trampoline still doesn't work.

So, here is my idea;

1) Change the trampline code to prepare stack frame at first and save
   registers on it, instead of "push". This will makes ORC easy to setup
   stackframe information for this code.
2) change the return addres fixup timing. Instead of using return value
   of trampoline handler, before removing the real return address from
   current->kretprobe_instances.
3) Then, if orc_find() finds the ip is in the kretprobe_trampoline, it
   checks the contents of the end of stackframe (at the place of regs->sp)
   is same as the address of it. If it is, it can find the correct address
   from current->kretprobe_instances. If not, there is a correct address.

I need to find how the ORC info is prepared for 1), maybe UNWIND_HINT macro
may help?

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

  reply	other threads:[~2021-03-08  2:53 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-05 15:38 [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes Masami Hiramatsu
2021-03-05 15:39 ` [PATCH -tip 1/5] ia64: kprobes: Fix to pass correct trampoline address to the handler Masami Hiramatsu
2021-03-05 15:39 ` [PATCH -tip 2/5] kprobes: treewide: Replace arch_deref_entry_point() with dereference_function_descriptor() Masami Hiramatsu
2021-03-05 15:39 ` [PATCH -tip 3/5] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler() Masami Hiramatsu
2021-03-10 14:21   ` Miroslav Benes
2021-03-10 15:42     ` Masami Hiramatsu
2021-03-11 13:37       ` Masami Hiramatsu
2021-03-05 15:39 ` [PATCH -tip 4/5] kprobes: stacktrace: Recover the address changed by kretprobe Masami Hiramatsu
2021-03-05 15:39 ` [PATCH -tip 5/5] tracing: Remove kretprobe unknown indicator from stacktrace Masami Hiramatsu
2021-03-05 15:44   ` Steven Rostedt
2021-03-05 19:16 ` [PATCH -tip 0/5] kprobes: Fix stacktrace in kretprobes Daniel Xu
2021-03-06  1:13   ` Masami Hiramatsu
2021-03-07 21:23     ` Daniel Xu
2021-03-08  2:52       ` Masami Hiramatsu [this message]
2021-03-08 13:05         ` Masami Hiramatsu
2021-03-09  1:19         ` Josh Poimboeuf
2021-03-10  9:57           ` Masami Hiramatsu
2021-03-10 15:08             ` Josh Poimboeuf
2021-03-10 15:55               ` Masami Hiramatsu
2021-03-10 18:31                 ` Josh Poimboeuf
2021-03-11  0:20                   ` Masami Hiramatsu
2021-03-11  1:06                     ` Josh Poimboeuf
2021-03-11  1:54                       ` Masami Hiramatsu
2021-03-11 16:51                         ` Josh Poimboeuf
2021-03-12  3:22                           ` Masami Hiramatsu
2021-03-10 22:46                 ` Daniel Xu
2021-03-09 21:34         ` Daniel Xu
2021-03-10 10:50           ` Masami Hiramatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210308115210.732f2c42bf347c15fbb2a828@kernel.org \
    --to=mhiramat@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=dxu@dxuuu.xyz \
    --cc=jpoimboe@redhat.com \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).