bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Eugene Loh <eugene.loh@oracle.com>, <bpf@vger.kernel.org>
Subject: Re: using skip>0 with bpf_get_stack()
Date: Mon, 28 Jun 2021 20:33:11 -0700	[thread overview]
Message-ID: <4e2e5738-b103-d340-753e-7e37e06304c4@fb.com> (raw)
In-Reply-To: <1b59751f-0bb1-a4ad-6548-2536e60a80ec@oracle.com>



On 6/25/21 6:22 PM, Eugene Loh wrote:
> 
> On 6/1/21 5:48 PM, Yonghong Song wrote:
>>
>>
>> On 5/28/21 3:16 PM, Eugene Loh wrote:
>>> I have a question about bpf_get_stack(). I'm interested in the case
>>>          skip > 0
>>>          user_build_id == 0
>>>          num_elem < sysctl_perf_event_max_stack
>>>
>>> The function sets
>>>          init_nr = sysctl_perf_event_max_stack - num_elem;
>>> which means that get_perf_callchain() will return "num_elem" stack 
>>> frames.  Then, since we skip "skip" frames, we'll fill the user 
>>> buffer with only "num_elem - skip" frames, the remaining frames being 
>>> filled zero.
>>>
>>> For example, let's say the call stack is
>>>          leaf <- caller <- foo1 <- foo2 <- foo3 <- foo4 <- foo5 <- foo6
>>>
>>> Let's say I pass bpf_get_stack() a buffer with num_elem==4 and ask 
>>> skip==2.  I would expect to skip 2 frames then get 4 frames, getting 
>>> back:
>>>          foo1  foo2  foo3  foo4
>>>
>>> Instead, I get
>>>          foo1  foo2  0  0
>>> skipping 2 frames but also leaving frames zeroed out.
>>
>> Thanks for reporting. I looked at codes and it does seem that we may
>> have a kernel bug when skip != 0. Basically as you described,
>> initially we collected num_elem stacks and later on we reduced by skip
>> so we got num_elem - skip as you calculated in the above.
>>
>>>
>>> I think the init_nr computation should be:
>>>
>>> -       if (sysctl_perf_event_max_stack < num_elem)
>>> +       if (sysctl_perf_event_max_stack <= num_elem + skip)
>>>                  init_nr = 0;
>>>          else
>>> -               init_nr = sysctl_perf_event_max_stack - num_elem;
>>> +               init_nr = sysctl_perf_event_max_stack - num_elem - skip;
>>
>> A rough check looks like this is one correct way to fix the issue.
>>
>>> Incidentally, the return value of the function is presumably the size 
>>> of the returned data.  Would it make sense to say so in 
>>> include/uapi/linux/bpf.h?
>>
>> The current documentation says:
>>  *      Return
>>  *              A non-negative value equal to or less than *size* on 
>> success,
>>  *              or a negative error in case of failure.
>>
>> I think you can improve with more precise description such that
>> a non-negative value for the copied **buf** length.
>>
>> Could you submit a patch for this? Thanks!
> 
> Sure.  Thanks for looking at this and sorry about the long delay getting 
> back to you.
> 
> Could you take a look at the attached, proposed patch?  As you see in 
> the commit message, I'm unclear about the bpf_get_stack*_pe() variants. 
> They might use an earlier construct callchain, and I do not know ho
> init_nr was set for them.

I think bpf_get_stackid() and __bpf_get_stackid() implementation is 
correct. Did you find any issues?

For bpf_get_stack_pe, see:
 
https://lore.kernel.org/bpf/20200723180648.1429892-2-songliubraving@fb.com/
I think you should not change bpf_get_stack() function.
__bpf_get_stack() is used by bpf_get_stack() and bpf_get_stack_pe().
In bpf_get_stack_pe(), callchain is fetched by perf event infrastructure
if event->attr.sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY is true.

Just focus on __bpf_get_stack(). We could factor __bpf_get_stackid(),
but unless we have a bug, I didn't see it is necessary.

It will be good if you can add a test for the change, there is a 
stacktrace test prog_tests/stacktrace_map.c, you can take a look,
and you can add a subtest there.

Next time, you can submit a formal patch with `git send-email ...` to
this alias. This way it is easier to review compared to attachment.

  reply	other threads:[~2021-06-29  3:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-28 22:16 using skip>0 with bpf_get_stack() Eugene Loh
2021-06-01 21:48 ` Yonghong Song
2021-06-26  1:22   ` Eugene Loh
2021-06-29  3:33     ` Yonghong Song [this message]
2022-03-04 20:37       ` Namhyung Kim
2022-03-04 20:50         ` Eugene Loh
2022-03-04 21:16           ` Namhyung Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e2e5738-b103-d340-753e-7e37e06304c4@fb.com \
    --to=yhs@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=eugene.loh@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).