All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <ast@fb.com>
To: Josh Poimboeuf <jpoimboe@redhat.com>, Kairui Song <kasong@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Song Liu <songliubraving@fb.com>,
	lkml <linux-kernel@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>
Subject: Re: Getting empty callchain from perf_callchain_kernel()
Date: Wed, 22 May 2019 21:55:08 +0000	[thread overview]
Message-ID: <c517f213-01d5-b95b-1a4c-5dddedd71419@fb.com> (raw)
In-Reply-To: <20190522180749.qpwdlhkcitxiazco@treble>

On 5/22/19 11:07 AM, Josh Poimboeuf wrote:
> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
>> On Fri, May 17, 2019 at 4:11 PM Peter Zijlstra <peterz@infradead.org> wrote:
>>>
>>> On Fri, May 17, 2019 at 09:46:00AM +0200, Peter Zijlstra wrote:
>>>> On Thu, May 16, 2019 at 11:51:55PM +0000, Song Liu wrote:
>>>>> Hi,
>>>>>
>>>>> We found a failure with selftests/bpf/tests_prog in test_stacktrace_map (on bpf/master
>>>>> branch).
>>>>>
>>>>> After digging into the code, we found that perf_callchain_kernel() is giving empty
>>>>> callchain for tracepoint sched/sched_switch. And it seems related to commit
>>>>>
>>>>> d15d356887e770c5f2dcf963b52c7cb510c9e42d
>>>>> ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER")
>>>>>
>>>>> Before this commit, perf_callchain_kernel() returns callchain with regs->ip. With
>>>>> this commit, regs->ip is not sent for !perf_hw_regs(regs) case.
>>>>
>>>> So while I think the below is indeed right; we should store regs->ip
>>>> regardless of the unwind path chosen, I still think there's something
>>>> fishy if this results in just the 1 entry.
>>>>
>>>> The sched/sched_switch event really should have a non-trivial stack.
>>>>
>>>> Let me see if I can reproduce with just perf.
>>>
>>> $ perf record -g -e "sched:sched_switch" -- make clean
>>> $ perf report -D
>>>
>>> 12 904071759467 0x1790 [0xd0]: PERF_RECORD_SAMPLE(IP, 0x1): 7236/7236: 0xffffffff81c29562 period: 1 addr: 0
>>> ... FP chain: nr:10
>>> .....  0: ffffffffffffff80
>>> .....  1: ffffffff81c29562
>>> .....  2: ffffffff81c29933
>>> .....  3: ffffffff8111f688
>>> .....  4: ffffffff81120b9d
>>> .....  5: ffffffff81120ce5
>>> .....  6: ffffffff8100254a
>>> .....  7: ffffffff81e0007d
>>> .....  8: fffffffffffffe00
>>> .....  9: 00007f9b6cd9682a
>>> ... thread: sh:7236
>>> ...... dso: /lib/modules/5.1.0-12177-g41bbb9129767/build/vmlinux
>>>
>>>
>>> IOW, it seems to 'work'.
>>>
>>
>> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
>> some other bfp functions) is now broken, or, strating an unwind
>> directly inside a bpf program will end up strangely. It have following
>> kernel message:
>>
>> WARNING: kernel stack frame pointer at 0000000070cad07c in
>> test_progs:1242 has bad value 00000000ffd4497e
>>
>> And in the debug message:
>>
>> [  160.460287] 000000006e117175: ffffffffaa23a0e8
>> (get_perf_callchain+0x148/0x280)
>> [  160.460287] 0000000002e8715f: 0000000000c6bba0 (0xc6bba0)
>> [  160.460288] 00000000b3d07758: ffff9ce3f9790000 (0xffff9ce3f9790000)
>> [  160.460289] 0000000055c97836: ffff9ce3f9790000 (0xffff9ce3f9790000)
>> [  160.460289] 000000007cbb140a: 000000010000007f (0x10000007f)
>> [  160.460290] 000000007dc62ac9: 0000000000000000 ...
>> [  160.460290] 000000006b41e346: 1c7da01cd70c4000 (0x1c7da01cd70c4000)
>> [  160.460291] 00000000f23d1826: ffffd89cffc3ae80 (0xffffd89cffc3ae80)
>> [  160.460292] 00000000f9a16017: 000000000000007f (0x7f)
>> [  160.460292] 00000000a8e62d44: 0000000000000000 ...
>> [  160.460293] 00000000cbc83c97: ffffb89d00d8d000 (0xffffb89d00d8d000)
>> [  160.460293] 0000000092842522: 000000000000007f (0x7f)
>> [  160.460294] 00000000dafbec89: ffffb89d00c6bc50 (0xffffb89d00c6bc50)
>> [  160.460296] 000000000777e4cf: ffffffffaa225d97 (bpf_get_stackid+0x77/0x470)
>> [  160.460296] 000000009942ea16: 0000000000000000 ...
>> [  160.460297] 00000000a08006b1: 0000000000000001 (0x1)
>> [  160.460298] 000000009f03b438: ffffb89d00c6bc30 (0xffffb89d00c6bc30)
>> [  160.460299] 000000006caf8b32: ffffffffaa074fe8 (__do_page_fault+0x58/0x90)
>> [  160.460300] 000000003a13d702: 0000000000000000 ...
>> [  160.460300] 00000000e2e2496d: ffff9ce300000000 (0xffff9ce300000000)
>> [  160.460301] 000000008ee6b7c2: ffffd89cffc3ae80 (0xffffd89cffc3ae80)
>> [  160.460301] 00000000a8cf6d55: 0000000000000000 ...
>> [  160.460302] 0000000059078076: ffffd89cffc3ae80 (0xffffd89cffc3ae80)
>> [  160.460303] 00000000c6aac739: ffff9ce3f1e18eb0 (0xffff9ce3f1e18eb0)
>> [  160.460303] 00000000a39aff92: ffffb89d00c6bc60 (0xffffb89d00c6bc60)
>> [  160.460305] 0000000097498bf2: ffffffffaa1f4791 (bpf_get_stackid_tp+0x11/0x20)
>> [  160.460306] 000000006992de1e: ffffb89d00c6bc78 (0xffffb89d00c6bc78)
>> [  160.460307] 00000000dacd0ce5: ffffffffc0405676 (0xffffffffc0405676)
>> [  160.460307] 00000000a81f2714: 0000000000000000 ...
>>
>> # Note here is the invalid frame pointer
>> [  160.460308] 0000000070cad07c: ffffb89d00a1d000 (0xffffb89d00a1d000)
>> [  160.460308] 00000000f8f194e4: 0000000000000001 (0x1)
>> [  160.460309] 000000002134f42a: ffffd89cffc3ae80 (0xffffd89cffc3ae80)
>> [  160.460310] 00000000f9345889: ffff9ce3f1e18eb0 (0xffff9ce3f1e18eb0)
>> [  160.460310] 000000008ad22a42: 0000000000000000 ...
>> [  160.460311] 0000000073808173: ffffb89d00c6bce0 (0xffffb89d00c6bce0)
>> [  160.460312] 00000000c9effff4: ffffffffaa1f48d1 (trace_call_bpf+0x81/0x100)
>> [  160.460313] 00000000c5d8ebd1: ffffb89d00c6bcc0 (0xffffb89d00c6bcc0)
>> [  160.460315] 00000000bce0b072: ffffffffab651be0
>> (event_sched_migrate_task+0xa0/0xa0)
>> [  160.460316] 00000000355cf319: 0000000000000000 ...
>> [  160.460316] 000000003b67f2ad: ffffd89cffc3ae80 (0xffffd89cffc3ae80)
>> [  160.460316] 000000009a77e20b: ffff9ce3fba25000 (0xffff9ce3fba25000)
>> [  160.460317] 0000000032cf7376: 0000000000000001 (0x1)
>> [  160.460317] 000000000051db74: ffffb89d00c6bd20 (0xffffb89d00c6bd20)
>> [  160.460318] 0000000040eb3bf7: ffffffffaa22be81
>> (perf_trace_run_bpf_submit+0x41/0xb0)
> 
> Is there an easy way to recreate this?
> 

The failure I care about can be reproduced with:

cd tools/testing/selftests/bpf
make
./test_progs
test_stacktrace_map:FAIL:compare_map_keys stackid_hmap vs. stackmap err 
-1 errno 2


  reply	other threads:[~2019-05-22 21:55 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-16 23:51 Getting empty callchain from perf_callchain_kernel() Song Liu
2019-05-17  7:46 ` Peter Zijlstra
2019-05-17  8:10   ` Peter Zijlstra
2019-05-17  8:15     ` Kairui Song
2019-05-17  8:32       ` Kairui Song
2019-05-17 16:22         ` Song Liu
2019-05-17  9:10       ` Peter Zijlstra
2019-05-17 18:40         ` Song Liu
2019-05-17 21:06           ` Alexei Starovoitov
2019-05-17 21:48             ` Song Liu
2019-05-19 18:07               ` Kairui Song
2019-05-20 17:22                 ` Song Liu
2019-05-22 13:51                   ` Peter Zijlstra
2019-05-19 18:06         ` Kairui Song
2019-05-20 17:16           ` Song Liu
2019-05-20 17:19           ` Song Liu
2019-05-22 14:02           ` Peter Zijlstra
2019-05-22 14:49             ` Alexei Starovoitov
2019-05-22 17:45               ` Josh Poimboeuf
2019-05-22 23:46                 ` Josh Poimboeuf
2019-05-23  6:48                   ` Kairui Song
2019-05-23  8:27                     ` Song Liu
2019-05-23  9:11                       ` Kairui Song
2019-05-23 13:32                     ` Josh Poimboeuf
2019-05-23 14:50                       ` Kairui Song
2019-05-23 15:24                         ` Josh Poimboeuf
2019-05-23 16:41                           ` Kairui Song
2019-05-23 17:27                             ` Josh Poimboeuf
2019-05-24  2:20                               ` Kairui Song
2019-05-24 23:23                                 ` Josh Poimboeuf
2019-05-27 11:57                                   ` Kairui Song
2019-06-06 16:04                                     ` Song Liu
2019-06-06 23:58                                       ` Josh Poimboeuf
2019-06-11 21:03                                       ` Josh Poimboeuf
2019-05-24  8:53                           ` Peter Zijlstra
2019-05-24 13:05                             ` Josh Poimboeuf
2019-06-12  3:05                             ` Josh Poimboeuf
2019-06-12  8:54                               ` Peter Zijlstra
2019-06-12 14:50                                 ` Josh Poimboeuf
2019-06-13 20:26                                   ` Josh Poimboeuf
2019-06-12 13:10                               ` Steven Rostedt
2019-06-12 14:26                                 ` Josh Poimboeuf
2019-05-22 18:07       ` Josh Poimboeuf
2019-05-22 21:55         ` Alexei Starovoitov [this message]
2019-05-17 16:32     ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c517f213-01d5-b95b-1a4c-5dddedd71419@fb.com \
    --to=ast@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=jpoimboe@redhat.com \
    --cc=kasong@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.