From: Yonghong Song <yhs@fb.com>
To: Eugene Loh <eugene.loh@oracle.com>, <bpf@vger.kernel.org>
Subject: Re: using skip>0 with bpf_get_stack()
Date: Tue, 1 Jun 2021 14:48:50 -0700 [thread overview]
Message-ID: <c65ac449-ec54-3dff-5447-8a318001285b@fb.com> (raw)
In-Reply-To: <30a7b5d5-6726-1cc2-eaee-8da2828a9a9c@oracle.com>
On 5/28/21 3:16 PM, Eugene Loh wrote:
> I have a question about bpf_get_stack(). I'm interested in the case
> skip > 0
> user_build_id == 0
> num_elem < sysctl_perf_event_max_stack
>
> The function sets
> init_nr = sysctl_perf_event_max_stack - num_elem;
> which means that get_perf_callchain() will return "num_elem" stack
> frames. Then, since we skip "skip" frames, we'll fill the user buffer
> with only "num_elem - skip" frames, the remaining frames being filled zero.
>
> For example, let's say the call stack is
> leaf <- caller <- foo1 <- foo2 <- foo3 <- foo4 <- foo5 <- foo6
>
> Let's say I pass bpf_get_stack() a buffer with num_elem==4 and ask
> skip==2. I would expect to skip 2 frames then get 4 frames, getting back:
> foo1 foo2 foo3 foo4
>
> Instead, I get
> foo1 foo2 0 0
> skipping 2 frames but also leaving frames zeroed out.
Thanks for reporting. I looked at codes and it does seem that we may
have a kernel bug when skip != 0. Basically as you described,
initially we collected num_elem stacks and later on we reduced by skip
so we got num_elem - skip as you calculated in the above.
>
> I think the init_nr computation should be:
>
> - if (sysctl_perf_event_max_stack < num_elem)
> + if (sysctl_perf_event_max_stack <= num_elem + skip)
> init_nr = 0;
> else
> - init_nr = sysctl_perf_event_max_stack - num_elem;
> + init_nr = sysctl_perf_event_max_stack - num_elem - skip;
A rough check looks like this is one correct way to fix the issue.
> Incidentally, the return value of the function is presumably the size of
> the returned data. Would it make sense to say so in
> include/uapi/linux/bpf.h?
The current documentation says:
* Return
* A non-negative value equal to or less than *size* on
success,
* or a negative error in case of failure.
I think you can improve with more precise description such that
a non-negative value for the copied **buf** length.
Could you submit a patch for this? Thanks!
next prev parent reply other threads:[~2021-06-01 21:48 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-28 22:16 using skip>0 with bpf_get_stack() Eugene Loh
2021-06-01 21:48 ` Yonghong Song [this message]
2021-06-26 1:22 ` Eugene Loh
2021-06-29 3:33 ` Yonghong Song
2022-03-04 20:37 ` Namhyung Kim
2022-03-04 20:50 ` Eugene Loh
2022-03-04 21:16 ` Namhyung Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c65ac449-ec54-3dff-5447-8a318001285b@fb.com \
--to=yhs@fb.com \
--cc=bpf@vger.kernel.org \
--cc=eugene.loh@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).