bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Song Liu <songliubraving@fb.com>
Cc: open list <linux-kernel@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, Networking <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>,
	john fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@chromium.org>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Peter Ziljstra <peterz@infradead.org>
Subject: Re: [PATCH v2 bpf-next 1/2] bpf: separate bpf_get_[stack|stackid] for perf events BPF
Date: Wed, 15 Jul 2020 22:55:16 -0700	[thread overview]
Message-ID: <CAEf4BzZTYVq=oct0rUg5Y+wOc07k2Pcz-66M-NtjRFJTez0AvA@mail.gmail.com> (raw)
In-Reply-To: <20200715052601.2404533-2-songliubraving@fb.com>

On Tue, Jul 14, 2020 at 11:08 PM Song Liu <songliubraving@fb.com> wrote:
>
> Calling get_perf_callchain() on perf_events from PEBS entries may cause
> unwinder errors. To fix this issue, the callchain is fetched early. Such
> perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.
>
> Similarly, calling bpf_get_[stack|stackid] on perf_events from PEBS may
> also cause unwinder errors. To fix this, add separate version of these
> two helpers, bpf_get_[stack|stackid]_pe. These two hepers use callchain in
> bpf_perf_event_data_kern->data->callchain.
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  include/linux/bpf.h      |   2 +
>  kernel/bpf/stackmap.c    | 204 +++++++++++++++++++++++++++++++++++----
>  kernel/trace/bpf_trace.c |   4 +-
>  3 files changed, 190 insertions(+), 20 deletions(-)
>

Glad this approach worked out! Few minor bugs below, though.

[...]

> +       if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
> +                              BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
> +               return -EINVAL;
> +
> +       user = flags & BPF_F_USER_STACK;
> +       kernel = !user;
> +
> +       has_kernel = !event->attr.exclude_callchain_kernel;
> +       has_user = !event->attr.exclude_callchain_user;
> +
> +       if ((kernel && !has_kernel) || (user && !has_user))
> +               return -EINVAL;
> +
> +       trace = ctx->data->callchain;
> +       if (!trace || (!has_kernel && !has_user))

(!has_kernel && !has_user) can never happen, it's checked by if above
(one of kernel or user is always true => one of has_user or has_kernel
is always true).

> +               return -EFAULT;
> +
> +       if (has_kernel && has_user) {
> +               __u64 nr_kernel = count_kernel_ip(trace);
> +               int ret;
> +
> +               if (kernel) {
> +                       __u64 nr = trace->nr;
> +
> +                       trace->nr = nr_kernel;
> +                       ret = __bpf_get_stackid(map, trace, flags);
> +
> +                       /* restore nr */
> +                       trace->nr = nr;
> +               } else { /* user */
> +                       u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
> +
> +                       skip += nr_kernel;
> +                       if (skip > ~BPF_F_SKIP_FIELD_MASK)

something fishy here: ~BPF_F_SKIP_FIELD_MASK is a really big number,
were you going to check that skip is not bigger than 255 (i.e., fits
within BPF_F_SKIP_FIELD_MASK)?

> +                               return -EFAULT;
> +
> +                       flags = (flags & ~BPF_F_SKIP_FIELD_MASK) |
> +                               (skip  & BPF_F_SKIP_FIELD_MASK);
> +                       ret = __bpf_get_stackid(map, trace, flags);
> +               }
> +               return ret;
> +       }
> +       return __bpf_get_stackid(map, trace, flags);
> +}
> +

[...]

> +
> +       has_kernel = !event->attr.exclude_callchain_kernel;
> +       has_user = !event->attr.exclude_callchain_user;
> +
> +       if ((kernel && !has_kernel) || (user && !has_user))
> +               goto clear;
> +
> +       err = -EFAULT;
> +       trace = ctx->data->callchain;
> +       if (!trace || (!has_kernel && !has_user))
> +               goto clear;

same as above for bpf_get_stackid, probably can be simplified

> +
> +       if (has_kernel && has_user) {
> +               __u64 nr_kernel = count_kernel_ip(trace);
> +               int ret;
> +
> +               if (kernel) {
> +                       __u64 nr = trace->nr;
> +
> +                       trace->nr = nr_kernel;
> +                       ret = __bpf_get_stack(ctx->regs, NULL, trace, buf,
> +                                             size, flags);
> +
> +                       /* restore nr */
> +                       trace->nr = nr;
> +               } else { /* user */
> +                       u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
> +
> +                       skip += nr_kernel;
> +                       if (skip > ~BPF_F_SKIP_FIELD_MASK)
> +                               goto clear;
> +

and here

> +                       flags = (flags & ~BPF_F_SKIP_FIELD_MASK) |
> +                               (skip  & BPF_F_SKIP_FIELD_MASK);

actually if you check that skip <= BPF_F_SKIP_FIELD_MASK, you don't
need to mask it here, just `| skip`

> +                       ret = __bpf_get_stack(ctx->regs, NULL, trace, buf,
> +                                             size, flags);
> +               }
> +               return ret;
> +       }
> +       return __bpf_get_stack(ctx->regs, NULL, trace, buf, size, flags);
> +clear:
> +       memset(buf, 0, size);
> +       return err;
> +
> +}
> +

[...]

  reply	other threads:[~2020-07-16  5:55 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-15  5:25 [PATCH v2 bpf-next 0/2] bpf: fix stackmap on perf_events with PEBS Song Liu
2020-07-15  5:26 ` [PATCH v2 bpf-next 1/2] bpf: separate bpf_get_[stack|stackid] for perf events BPF Song Liu
2020-07-16  5:55   ` Andrii Nakryiko [this message]
2020-07-17  1:07   ` kernel test robot
2020-07-15  5:26 ` [PATCH v2 bpf-next 2/2] selftests/bpf: add callchain_stackid Song Liu
2020-07-16  6:04   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEf4BzZTYVq=oct0rUg5Y+wOc07k2Pcz-66M-NtjRFJTez0AvA@mail.gmail.com' \
    --to=andrii.nakryiko@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=kpsingh@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).