On 6/1/21 5:48 PM, Yonghong Song wrote: > > > On 5/28/21 3:16 PM, Eugene Loh wrote: >> I have a question about bpf_get_stack(). I'm interested in the case >>          skip > 0 >>          user_build_id == 0 >>          num_elem < sysctl_perf_event_max_stack >> >> The function sets >>          init_nr = sysctl_perf_event_max_stack - num_elem; >> which means that get_perf_callchain() will return "num_elem" stack >> frames.  Then, since we skip "skip" frames, we'll fill the user >> buffer with only "num_elem - skip" frames, the remaining frames being >> filled zero. >> >> For example, let's say the call stack is >>          leaf <- caller <- foo1 <- foo2 <- foo3 <- foo4 <- foo5 <- foo6 >> >> Let's say I pass bpf_get_stack() a buffer with num_elem==4 and ask >> skip==2.  I would expect to skip 2 frames then get 4 frames, getting >> back: >>          foo1  foo2  foo3  foo4 >> >> Instead, I get >>          foo1  foo2  0  0 >> skipping 2 frames but also leaving frames zeroed out. > > Thanks for reporting. I looked at codes and it does seem that we may > have a kernel bug when skip != 0. Basically as you described, > initially we collected num_elem stacks and later on we reduced by skip > so we got num_elem - skip as you calculated in the above. > >> >> I think the init_nr computation should be: >> >> -       if (sysctl_perf_event_max_stack < num_elem) >> +       if (sysctl_perf_event_max_stack <= num_elem + skip) >>                  init_nr = 0; >>          else >> -               init_nr = sysctl_perf_event_max_stack - num_elem; >> +               init_nr = sysctl_perf_event_max_stack - num_elem - skip; > > A rough check looks like this is one correct way to fix the issue. > >> Incidentally, the return value of the function is presumably the size >> of the returned data.  Would it make sense to say so in >> include/uapi/linux/bpf.h? > > The current documentation says: >  *      Return >  *              A non-negative value equal to or less than *size* on > success, >  *              or a negative error in case of failure. > > I think you can improve with more precise description such that > a non-negative value for the copied **buf** length. > > Could you submit a patch for this? Thanks! Sure.  Thanks for looking at this and sorry about the long delay getting back to you. Could you take a look at the attached, proposed patch?  As you see in the commit message, I'm unclear about the bpf_get_stack*_pe() variants.  They might use an earlier construct callchain, and I do not know how init_nr was set for them.