All of lore.kernel.org
 help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Kairui Song <kasong@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	"Josh Poimboeuf" <jpoimboe@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: Getting empty callchain from perf_callchain_kernel()
Date: Mon, 20 May 2019 17:19:22 +0000	[thread overview]
Message-ID: <BB5859B6-6686-474A-8CB2-FAC183C92829@fb.com> (raw)
In-Reply-To: <CACPcB9cpNp5CBqoRs+XMCwufzAFa8Pj-gbmj9fb+g5wVdue=ig@mail.gmail.com>

Sorry for previous empty email.. Clicked send by accident. 

> On May 19, 2019, at 11:06 AM, Kairui Song <kasong@redhat.com> wrote:
> 
> On Fri, May 17, 2019 at 5:10 PM Peter Zijlstra <peterz@infradead.org> wrote:
>> 
>> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
>>> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
>>> some other bfp functions) is now broken, or, strating an unwind
>>> directly inside a bpf program will end up strangely. It have following
>>> kernel message:
>> 
>> Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
>> follow.
> 
> bpf_get_stackid_tp will just use the regs passed to it from the trace
> point. And then it will eventually call perf_get_callchain to get the
> call chain.
> With a tracepoint we have the fake regs, so unwinder will start from
> where it is called, and use the fake regs as the indicator of the
> target frame it want, and keep unwinding until reached the actually
> callsite.
> 
> But if the stack trace is started withing a bpf func call then it's broken...
> 
> If the unwinder could trace back through the bpf func call then there
> will be no such problem.
> 
> For frame pointer unwinder, set the indicator flag (X86_EFLAGS_FIXED)
> before bpf call, and ensure bp is also dumped could fix it (so it will
> start using the regs for bpf calls, like before the commit
> d15d356887e7). But for ORC I don't see a clear way to fix the problem.
> First though is maybe dump some callee's regs for ORC (IP, BP, SP, DI,
> DX, R10, R13, else?) in the trace point handler, then use the flag to
> indicate ORC to do one more unwind (because can't get caller's regs,
> so get callee's regs instaed) before actually giving output?
> 
> I had a try, for framepointer unwinder, mark the indicator flag before
> calling bpf functions, and dump bp as well in the trace point. Then
> with frame pointer, it works, test passed:
> 
> diff --git a/arch/x86/include/asm/perf_event.h
> b/arch/x86/include/asm/perf_event.h
> index 1392d5e6e8d6..6f1192e9776b 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -302,12 +302,25 @@ extern unsigned long perf_misc_flags(struct
> pt_regs *regs);
> 
> #include <asm/stacktrace.h>
> 
> +#ifdef CONFIG_FRAME_POINTER
> +static inline unsigned long caller_frame_pointer(void)
> +{
> +       return (unsigned long)__builtin_frame_address(1);
> +}
> +#else
> +static inline unsigned long caller_frame_pointer(void)
> +{
> +       return 0;
> +}
> +#endif
> +
> /*
>  * We abuse bit 3 from flags to pass exact information, see perf_misc_flags
>  * and the comment with PERF_EFLAGS_EXACT.
>  */
> #define perf_arch_fetch_caller_regs(regs, __ip)                {       \
>        (regs)->ip = (__ip);                                    \
> +       (regs)->bp = caller_frame_pointer();                    \
>        (regs)->sp = (unsigned long)__builtin_frame_address(0); \
>        (regs)->cs = __KERNEL_CS;                               \
>        regs->flags = 0;                                        \
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index abbd4b3b96c2..ca7b95ee74f0 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8549,6 +8549,7 @@ void perf_trace_run_bpf_submit(void *raw_data,
> int size, int rctx,
>                               struct task_struct *task)
> {
>        if (bpf_prog_array_valid(call)) {
> +               regs->flags |= X86_EFLAGS_FIXED;
>                *(struct pt_regs **)raw_data = regs;
>                if (!trace_call_bpf(call, raw_data) || hlist_empty(head)) {
>                        perf_swevent_put_recursion_context(rctx);
> @@ -8822,6 +8823,8 @@ static void bpf_overflow_handler(struct perf_event *event,
>        int ret = 0;
> 
>        ctx.regs = perf_arch_bpf_user_pt_regs(regs);
> +       ctx.regs->flags |= X86_EFLAGS_FIXED;
> +
>        preempt_disable();
>        if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
>                goto out;
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index f92d6ad5e080..e1fa656677dc 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -497,6 +497,8 @@ u64 bpf_event_output(struct bpf_map *map, u64
> flags, void *meta, u64 meta_size,
>        };
> 
>        perf_fetch_caller_regs(regs);
> +       regs->flags |= X86_EFLAGS_FIXED;
> +
>        perf_sample_data_init(sd, 0, 0);
>        sd->raw = &raw;
> 
> @@ -831,6 +833,8 @@ BPF_CALL_5(bpf_perf_event_output_raw_tp, struct
> bpf_raw_tracepoint_args *, args,
>        struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
> 
>        perf_fetch_caller_regs(regs);
> +       regs->flags |= X86_EFLAGS_FIXED;
> +
>        return ____bpf_perf_event_output(regs, map, flags, data, size);
> }
> 
> @@ -851,6 +855,8 @@ BPF_CALL_3(bpf_get_stackid_raw_tp, struct
> bpf_raw_tracepoint_args *, args,
>        struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
> 
>        perf_fetch_caller_regs(regs);
> +       regs->flags |= X86_EFLAGS_FIXED;
> +
>        /* similar to bpf_perf_event_output_tp, but pt_regs fetched
> differently */
>        return bpf_get_stackid((unsigned long) regs, (unsigned long) map,
>                               flags, 0, 0);
> @@ -871,6 +877,8 @@ BPF_CALL_4(bpf_get_stack_raw_tp, struct
> bpf_raw_tracepoint_args *, args,
>        struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
> 
>        perf_fetch_caller_regs(regs);
> +       regs->flags |= X86_EFLAGS_FIXED;
> +
>        return bpf_get_stack((unsigned long) regs, (unsigned long) buf,
>                             (unsigned long) size, flags, 0);
> }
> 

I think we really cannot do something above, as it leaks x86 specific
code into kernel/events and kernel/trace. 

> And *_raw_tp functions will fetch the regs by themselves so a bit
> trouble some...
> 
> ----------
> 
> And another approach is to make unwinder direct unwinding works when
> called by bpf (if possible and reasonable). I also had a nasty hacky
> experiment (posted below) to just force frame pointer unwinder's
> get_stack_info pass for bpf, then problem is fixed without any other
> workaround:
> 
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 753b8cfe8b8a..c0cfdf25f5ed 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -166,7 +166,8 @@ int get_stack_info(unsigned long *stack, struct
> task_struct *task,
>        if (in_entry_stack(stack, info))
>                goto recursion_check;
> 
> -       goto unknown;
> +       goto recursion_check;
> 
> recursion_check:
>        /*

I think this one doesn't work for ORC either? 

Thanks,
Song

> Don't know how to do it the right way, or is it even possible for all
> unwinders yet...
> 
> -- 
> Best Regards,
> Kairui Song


  parent reply	other threads:[~2019-05-20 17:20 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-16 23:51 Getting empty callchain from perf_callchain_kernel() Song Liu
2019-05-17  7:46 ` Peter Zijlstra
2019-05-17  8:10   ` Peter Zijlstra
2019-05-17  8:15     ` Kairui Song
2019-05-17  8:32       ` Kairui Song
2019-05-17 16:22         ` Song Liu
2019-05-17  9:10       ` Peter Zijlstra
2019-05-17 18:40         ` Song Liu
2019-05-17 21:06           ` Alexei Starovoitov
2019-05-17 21:48             ` Song Liu
2019-05-19 18:07               ` Kairui Song
2019-05-20 17:22                 ` Song Liu
2019-05-22 13:51                   ` Peter Zijlstra
2019-05-19 18:06         ` Kairui Song
2019-05-20 17:16           ` Song Liu
2019-05-20 17:19           ` Song Liu [this message]
2019-05-22 14:02           ` Peter Zijlstra
2019-05-22 14:49             ` Alexei Starovoitov
2019-05-22 17:45               ` Josh Poimboeuf
2019-05-22 23:46                 ` Josh Poimboeuf
2019-05-23  6:48                   ` Kairui Song
2019-05-23  8:27                     ` Song Liu
2019-05-23  9:11                       ` Kairui Song
2019-05-23 13:32                     ` Josh Poimboeuf
2019-05-23 14:50                       ` Kairui Song
2019-05-23 15:24                         ` Josh Poimboeuf
2019-05-23 16:41                           ` Kairui Song
2019-05-23 17:27                             ` Josh Poimboeuf
2019-05-24  2:20                               ` Kairui Song
2019-05-24 23:23                                 ` Josh Poimboeuf
2019-05-27 11:57                                   ` Kairui Song
2019-06-06 16:04                                     ` Song Liu
2019-06-06 23:58                                       ` Josh Poimboeuf
2019-06-11 21:03                                       ` Josh Poimboeuf
2019-05-24  8:53                           ` Peter Zijlstra
2019-05-24 13:05                             ` Josh Poimboeuf
2019-06-12  3:05                             ` Josh Poimboeuf
2019-06-12  8:54                               ` Peter Zijlstra
2019-06-12 14:50                                 ` Josh Poimboeuf
2019-06-13 20:26                                   ` Josh Poimboeuf
2019-06-12 13:10                               ` Steven Rostedt
2019-06-12 14:26                                 ` Josh Poimboeuf
2019-05-22 18:07       ` Josh Poimboeuf
2019-05-22 21:55         ` Alexei Starovoitov
2019-05-17 16:32     ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BB5859B6-6686-474A-8CB2-FAC183C92829@fb.com \
    --to=songliubraving@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jpoimboe@redhat.com \
    --cc=kasong@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.