netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: Matt Mullins <mmullins@fb.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Andrew Hall <hall@fb.com>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"ast@kernel.org" <ast@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Martin Lau <kafai@fb.com>, Yonghong Song <yhs@fb.com>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"mingo@redhat.com" <mingo@redhat.com>,
	Song Liu <songliubraving@fb.com>
Subject: Re: [PATCH bpf v2] bpf: preallocate a perf_sample_data per event fd
Date: Mon, 3 Jun 2019 17:56:08 -0700	[thread overview]
Message-ID: <CAADnVQLVnOs-94AGpnu_MLZ3J7SgUuSKWJQC-ioy7zOV9iiiZw@mail.gmail.com> (raw)
In-Reply-To: <71c96268-7779-6e34-3078-5532d9f8fa55@iogearbox.net>

On Mon, Jun 3, 2019 at 5:44 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 06/04/2019 01:54 AM, Alexei Starovoitov wrote:
> > On Mon, Jun 3, 2019 at 4:48 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >> On 06/04/2019 01:27 AM, Alexei Starovoitov wrote:
> >>> On Mon, Jun 3, 2019 at 3:59 PM Matt Mullins <mmullins@fb.com> wrote:
> >>>>
> >>>> If these are invariably non-nested, I can easily keep bpf_misc_sd when
> >>>> I resubmit.  There was no technical reason other than keeping the two
> >>>> codepaths as similar as possible.
> >>>>
> >>>> What resource gives you worry about doing this for the networking
> >>>> codepath?
> >>>
> >>> my preference would be to keep tracing and networking the same.
> >>> there is already minimal nesting in networking and probably we see
> >>> more when reuseport progs will start running from xdp and clsbpf
> >>>
> >>>>> Aside from that it's also really bad to miss events like this as exporting
> >>>>> through rb is critical. Why can't you have a per-CPU counter that selects a
> >>>>> sample data context based on nesting level in tracing? (I don't see a discussion
> >>>>> of this in your commit message.)
> >>>>
> >>>> This change would only drop messages if the same perf_event is
> >>>> attempted to be used recursively (i.e. the same CPU on the same
> >>>> PERF_EVENT_ARRAY map, as I haven't observed anything use index !=
> >>>> BPF_F_CURRENT_CPU in testing).
> >>>>
> >>>> I'll try to accomplish the same with a percpu nesting level and
> >>>> allocating 2 or 3 perf_sample_data per cpu.  I think that'll solve the
> >>>> same problem -- a local patch keeping track of the nesting level is how
> >>>> I got the above stack trace, too.
> >>>
> >>> I don't think counter approach works. The amount of nesting is unknown.
> >>> imo the approach taken in this patch is good.
> >>> I don't see any issue when event_outputs will be dropped for valid progs.
> >>> Only when user called the helper incorrectly without BPF_F_CURRENT_CPU.
> >>> But that's an error anyway.
> >>
> >> My main worry with this xchg() trick is that we'll miss to export crucial
> >> data with the EBUSY bailing out especially given nesting could increase in
> >> future as you state, so users might have a hard time debugging this kind of
> >> issue if they share the same perf event map among these programs, and no
> >> option to get to this data otherwise. Supporting nesting up to a certain
> >> level would still be better than a lost event which is also not reported
> >> through the usual way aka perf rb.
> >
> > I simply don't see this 'miss to export data' in all but contrived conditions.
> > Say two progs share the same perf event array.
> > One prog calls event_output and while rb logic is working
> > another prog needs to start executing and use the same event array
>
> Correct.
>
> > slot. Today it's only possible for tracing prog combined with networking,
> > but having two progs use the same event output array is pretty much
> > a user bug. Just like not passing BPF_F_CURRENT_CPU.
>
> I don't see the user bug part, why should that be a user bug?

because I suspect that 'struct bpf_event_entry' is not reentrable
(even w/o issues with 'struct perf_sample_data').

Even if we always use 'struct perf_sample_data' on stack, I suspect
the same 'struct bpf_event_entry' cannot be reused in the nested way.

> It's the same
> as if we would say that sharing a BPF hash map between networking programs
> attached to different hooks or networking and tracing would be a user bug
> which it is not. One concrete example would be cilium monitor where we
> currently expose skb trace and drop events a well as debug data through
> the same rb. This should be usable from any type that has perf_event_output
> helper enabled (e.g. XDP and tc/BPF) w/o requiring to walk yet another per
> cpu mmap rb from user space.

sure. those are valid use cases, but in all cases bpf_event_output() on
particular 'struct bpf_event_entry' will end before another one will begin, no?

  reply	other threads:[~2019-06-04  0:56 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-31 22:37 [PATCH bpf v2] bpf: preallocate a perf_sample_data per event fd Matt Mullins
2019-06-01  1:27 ` Song Liu
2019-06-01  3:12   ` Alexei Starovoitov
2019-06-03 13:08 ` Daniel Borkmann
2019-06-03 13:22   ` Daniel Borkmann
2019-06-03 22:58     ` Matt Mullins
2019-06-03 23:27       ` Alexei Starovoitov
2019-06-03 23:48         ` Daniel Borkmann
2019-06-03 23:54           ` Alexei Starovoitov
2019-06-04  0:43             ` Daniel Borkmann
2019-06-04  0:56               ` Alexei Starovoitov [this message]
2019-06-04  3:17               ` Matt Mullins
2019-06-06 18:54                 ` [PATCH bpf] bpf: fix nested bpf tracepoints with per-cpu data Matt Mullins
2019-06-06 22:13                   ` Jakub Kicinski
2019-06-06 22:39                     ` Matt Mullins
2019-06-07  2:59                   ` Andrii Nakryiko
2019-06-07  7:55                     ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAADnVQLVnOs-94AGpnu_MLZ3J7SgUuSKWJQC-ioy7zOV9iiiZw@mail.gmail.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hall@fb.com \
    --cc=kafai@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mmullins@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).