bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Kairui Song <kasong@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	"Josh Poimboeuf" <jpoimboe@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: Getting empty callchain from perf_callchain_kernel()
Date: Mon, 20 May 2019 17:16:07 +0000	[thread overview]
Message-ID: <366B9124-728E-4985-9649-FDA64CDDF1FB@fb.com> (raw)
In-Reply-To: <CACPcB9cpNp5CBqoRs+XMCwufzAFa8Pj-gbmj9fb+g5wVdue=ig@mail.gmail.com>



> On May 19, 2019, at 11:06 AM, Kairui Song <kasong@redhat.com> wrote:
> 
> On Fri, May 17, 2019 at 5:10 PM Peter Zijlstra <peterz@infradead.org> wrote:
>> 
>> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
>>> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
>>> some other bfp functions) is now broken, or, strating an unwind
>>> directly inside a bpf program will end up strangely. It have following
>>> kernel message:
>> 
>> Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
>> follow.
> 
> bpf_get_stackid_tp will just use the regs passed to it from the trace
> point. And then it will eventually call perf_get_callchain to get the
> call chain.
> With a tracepoint we have the fake regs, so unwinder will start from
> where it is called, and use the fake regs as the indicator of the
> target frame it want, and keep unwinding until reached the actually
> callsite.
> 
> But if the stack trace is started withing a bpf func call then it's broken...
> 
> If the unwinder could trace back through the bpf func call then there
> will be no such problem.
> 
> For frame pointer unwinder, set the indicator flag (X86_EFLAGS_FIXED)
> before bpf call, and ensure bp is also dumped could fix it (so it will
> start using the regs for bpf calls, like before the commit
> d15d356887e7). But for ORC I don't see a clear way to fix the problem.
> First though is maybe dump some callee's regs for ORC (IP, BP, SP, DI,
> DX, R10, R13, else?) in the trace point handler, then use the flag to
> indicate ORC to do one more unwind (because can't get caller's regs,
> so get callee's regs instaed) before actually giving output?
> 
> I had a try, for framepointer unwinder, mark the indicator flag before
> calling bpf functions, and dump bp as well in the trace point. Then
> with frame pointer, it works, test passed:
> 
> diff --git a/arch/x86/include/asm/perf_event.h
> b/arch/x86/include/asm/perf_event.h
> index 1392d5e6e8d6..6f1192e9776b 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -302,12 +302,25 @@ extern unsigned long perf_misc_flags(struct
> pt_regs *regs);
> 
> #include <asm/stacktrace.h>
> 
> +#ifdef CONFIG_FRAME_POINTER
> +static inline unsigned long caller_frame_pointer(void)
> +{
> +       return (unsigned long)__builtin_frame_address(1);
> +}
> +#else
> +static inline unsigned long caller_frame_pointer(void)
> +{
> +       return 0;
> +}
> +#endif
> +
> /*
>  * We abuse bit 3 from flags to pass exact information, see perf_misc_flags
>  * and the comment with PERF_EFLAGS_EXACT.
>  */
> #define perf_arch_fetch_caller_regs(regs, __ip)                {       \
>        (regs)->ip = (__ip);                                    \
> +       (regs)->bp = caller_frame_pointer();                    \
>        (regs)->sp = (unsigned long)__builtin_frame_address(0); \
>        (regs)->cs = __KERNEL_CS;                               \
>        regs->flags = 0;                                        \
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index abbd4b3b96c2..ca7b95ee74f0 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8549,6 +8549,7 @@ void perf_trace_run_bpf_submit(void *raw_data,
> int size, int rctx,
>                               struct task_struct *task)
> {
>        if (bpf_prog_array_valid(call)) {
> +               regs->flags |= X86_EFLAGS_FIXED;
>                *(struct pt_regs **)raw_data = regs;
>                if (!trace_call_bpf(call, raw_data) || hlist_empty(head)) {
>                        perf_swevent_put_recursion_context(rctx);
> @@ -8822,6 +8823,8 @@ static void bpf_overflow_handler(struct perf_event *event,
>        int ret = 0;
> 
>        ctx.regs = perf_arch_bpf_user_pt_regs(regs);
> +       ctx.regs->flags |= X86_EFLAGS_FIXED;
> +
>        preempt_disable();
>        if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
>                goto out;
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index f92d6ad5e080..e1fa656677dc 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -497,6 +497,8 @@ u64 bpf_event_output(struct bpf_map *map, u64
> flags, void *meta, u64 meta_size,
>        };
> 
>        perf_fetch_caller_regs(regs);
> +       regs->flags |= X86_EFLAGS_FIXED;
> +
>        perf_sample_data_init(sd, 0, 0);
>        sd->raw = &raw;
> 
> @@ -831,6 +833,8 @@ BPF_CALL_5(bpf_perf_event_output_raw_tp, struct
> bpf_raw_tracepoint_args *, args,
>        struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
> 
>        perf_fetch_caller_regs(regs);
> +       regs->flags |= X86_EFLAGS_FIXED;
> +
>        return ____bpf_perf_event_output(regs, map, flags, data, size);
> }
> 
> @@ -851,6 +855,8 @@ BPF_CALL_3(bpf_get_stackid_raw_tp, struct
> bpf_raw_tracepoint_args *, args,
>        struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
> 
>        perf_fetch_caller_regs(regs);
> +       regs->flags |= X86_EFLAGS_FIXED;
> +
>        /* similar to bpf_perf_event_output_tp, but pt_regs fetched
> differently */
>        return bpf_get_stackid((unsigned long) regs, (unsigned long) map,
>                               flags, 0, 0);
> @@ -871,6 +877,8 @@ BPF_CALL_4(bpf_get_stack_raw_tp, struct
> bpf_raw_tracepoint_args *, args,
>        struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
> 
>        perf_fetch_caller_regs(regs);
> +       regs->flags |= X86_EFLAGS_FIXED;
> +
>        return bpf_get_stack((unsigned long) regs, (unsigned long) buf,
>                             (unsigned long) size, flags, 0);
> }
> 
> And *_raw_tp functions will fetch the regs by themselves so a bit
> trouble some...
> 
> ----------
> 
> And another approach is to make unwinder direct unwinding works when
> called by bpf (if possible and reasonable). I also had a nasty hacky
> experiment (posted below) to just force frame pointer unwinder's
> get_stack_info pass for bpf, then problem is fixed without any other
> workaround:
> 
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 753b8cfe8b8a..c0cfdf25f5ed 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -166,7 +166,8 @@ int get_stack_info(unsigned long *stack, struct
> task_struct *task,
>        if (in_entry_stack(stack, info))
>                goto recursion_check;
> 
> -       goto unknown;
> +       goto recursion_check;
> 
> recursion_check:
>        /*
> 
> Don't know how to do it the right way, or is it even possible for all
> unwinders yet...
> 
> -- 
> Best Regards,
> Kairui Song


  reply	other threads:[~2019-05-20 17:17 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <3CD3EE63-0CD2-404A-A403-E11DCF2DF8D9@fb.com>
     [not found] ` <20190517074600.GJ2623@hirez.programming.kicks-ass.net>
     [not found]   ` <20190517081057.GQ2650@hirez.programming.kicks-ass.net>
     [not found]     ` <CACPcB9cB5n1HOmZcVpusJq8rAV5+KfmZ-Lxv3tgsSoy7vNrk7w@mail.gmail.com>
     [not found]       ` <20190517091044.GM2606@hirez.programming.kicks-ass.net>
2019-05-17 18:40         ` Getting empty callchain from perf_callchain_kernel() Song Liu
2019-05-17 21:06           ` Alexei Starovoitov
2019-05-17 21:48             ` Song Liu
2019-05-19 18:07               ` Kairui Song
2019-05-20 17:22                 ` Song Liu
2019-05-22 13:51                   ` Peter Zijlstra
2019-05-19 18:06         ` Kairui Song
2019-05-20 17:16           ` Song Liu [this message]
2019-05-20 17:19           ` Song Liu
2019-05-22 14:02           ` Peter Zijlstra
2019-05-22 14:49             ` Alexei Starovoitov
2019-05-22 17:45               ` Josh Poimboeuf
2019-05-22 23:46                 ` Josh Poimboeuf
2019-05-23  6:48                   ` Kairui Song
2019-05-23  8:27                     ` Song Liu
2019-05-23  9:11                       ` Kairui Song
2019-05-23 13:32                     ` Josh Poimboeuf
2019-05-23 14:50                       ` Kairui Song
2019-05-23 15:24                         ` Josh Poimboeuf
2019-05-23 16:41                           ` Kairui Song
2019-05-23 17:27                             ` Josh Poimboeuf
2019-05-24  2:20                               ` Kairui Song
2019-05-24 23:23                                 ` Josh Poimboeuf
2019-05-27 11:57                                   ` Kairui Song
2019-06-06 16:04                                     ` Song Liu
2019-06-06 23:58                                       ` Josh Poimboeuf
2019-06-11 21:03                                       ` Josh Poimboeuf
2019-05-24  8:53                           ` Peter Zijlstra
2019-05-24 13:05                             ` Josh Poimboeuf
2019-06-12  3:05                             ` Josh Poimboeuf
2019-06-12  8:54                               ` Peter Zijlstra
2019-06-12 14:50                                 ` Josh Poimboeuf
2019-06-13 20:26                                   ` Josh Poimboeuf
2019-06-12 13:10                               ` Steven Rostedt
2019-06-12 14:26                                 ` Josh Poimboeuf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=366B9124-728E-4985-9649-FDA64CDDF1FB@fb.com \
    --to=songliubraving@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jpoimboe@redhat.com \
    --cc=kasong@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).