bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Alban Crequy <alban.crequy@gmail.com>
Cc: Andrii Nakryiko <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
	Networking <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Jonathan Lemon <jonathan.lemon@gmail.com>,
	Stanislav Fomichev <sdf@google.com>,
	Alban Crequy <alban@kinvolk.io>,
	mauricio@kinvolk.io, kai@kinvolk.io
Subject: Re: [PATCH v2 bpf-next 7/7] docs/bpf: add BPF ring buffer design notes
Date: Mon, 25 May 2020 12:12:31 -0700	[thread overview]
Message-ID: <CAEf4BzY3mt8puNgOwi5ZWnVbXksnsXK_beG+HhhZutyBG-BO7A@mail.gmail.com> (raw)
In-Reply-To: <CAMXgnP424S5s-mrwFB_nuZNSuqLyi1K8r519WKVkyMBPtv1PMQ@mail.gmail.com>

On Mon, May 25, 2020 at 3:00 AM Alban Crequy <alban.crequy@gmail.com> wrote:
>
> Hi,
>
> Thanks. Both motivators look very interesting to me:
>
> On Sun, 17 May 2020 at 21:58, Andrii Nakryiko <andriin@fb.com> wrote:
> [...]
> > +Motivation
> > +----------
> > +There are two distinctive motivators for this work, which are not satisfied by
> > +existing perf buffer, which prompted creation of a new ring buffer
> > +implementation.
> > +  - more efficient memory utilization by sharing ring buffer across CPUs;
>
> I have a use case with traceloop
> (https://github.com/kinvolk/traceloop) where I use one
> BPF_MAP_TYPE_PERF_EVENT_ARRAY per container, so when the number of
> containers times the number of CPU is high, it can use a lot of
> memory.
>
> > +  - preserving ordering of events that happen sequentially in time, even
> > +  across multiple CPUs (e.g., fork/exec/exit events for a task).
>
> I had the problem to keep track of TCP connections and when
> tcp-connect and tcp-close events can be on different CPUs, it makes it
> difficult to get the correct order.

Yep, in one of BPF applications I've written, handling out-of-order
events was major complication to the design of data structures, as
well as user-space implementation logic.

>
> [...]
> > +There are a bunch of similarities between perf buffer
> > +(BPF_MAP_TYPE_PERF_EVENT_ARRAY) and new BPF ring buffer semantics:
> > +  - variable-length records;
> > +  - if there is no more space left in ring buffer, reservation fails, no
> > +    blocking;
> [...]
>
> BPF_MAP_TYPE_PERF_EVENT_ARRAY can be set as both 'overwriteable' and
> 'backward': if there is no more space left in ring buffer, it would
> then overwrite the old events. For that, the buffer needs to be
> prepared with mmap(...PROT_READ) instead of mmap(...PROT_READ |
> PROT_WRITE), and set the write_backward flag. See details in commit
> 9ecda41acb97 ("perf/core: Add ::write_backward attribute to perf
> event"):
>
> struct perf_event_attr attr = {0,};
> attr.write_backward = 1; /* backward */
> fd = perf_event_open_map(&attr, ...);
> base = mmap(fd, 0, size, PROT_READ /* overwriteable */, MAP_SHARED);
>
> I use overwriteable and backward ring buffers in traceloop: buffers
> are continuously overwritten and are usually not read, except when a
> user explicitly asks for it (e.g. to inspect the last few events of an
> application after a crash). If BPF_MAP_TYPE_RINGBUF implements the
> same features, then I would be able to switch and use less memory.
>
> Do you think it will be possible to implement that in BPF_MAP_TYPE_RINGBUF?
>

I think it could be implemented similarly. Consumer_pos would be
ignored, producer_pos would point to the beginning of record and
decremented on new reservation. All the implementation and semantics
would stay. Extending ringbuf itself to enable this is also trivial,
it could be just extra map_flag passed when map is created,
consumer_pos page would become mmap()'able as R/O, of course.

But I fail to see how consumer can be 100% certain it's not reading
garbage data, especially on 32-bit architectures, where wrapping over
32-bit producer position is actually quite easy. Just checking
producer position before/after read isn't completely correct. Ignoring
that problem, the only sane way (IMO) to do this would mean copying
each record into a "stable" memory, before actually doing anything
with it, which is a pretty bad performance hit as well.

So all in all, such mode could be added, but certainly in a separate
patch set and after some good discussion :).

> Cheers,
> Alban

      reply	other threads:[~2020-05-25 19:12 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-17 19:57 [PATCH v2 bpf-next 0/7] BPF ring buffer Andrii Nakryiko
2020-05-17 19:57 ` [PATCH v2 bpf-next 1/7] bpf: implement BPF ring buffer and verifier support for it Andrii Nakryiko
2020-05-19 12:57   ` kbuild test robot
2020-05-19 23:53   ` kbuild test robot
2020-05-22  0:25   ` Paul E. McKenney
2020-05-22 18:46     ` Andrii Nakryiko
2020-05-25 16:01       ` Paul E. McKenney
2020-05-25 18:45         ` Andrii Nakryiko
2020-05-22  1:07   ` Alexei Starovoitov
2020-05-22 18:48     ` Andrii Nakryiko
2020-05-25 20:34     ` Andrii Nakryiko
2020-05-17 19:57 ` [PATCH v2 bpf-next 2/7] tools/memory-model: add BPF ringbuf MPSC litmus tests Andrii Nakryiko
2020-05-22  0:34   ` Paul E. McKenney
2020-05-22 18:51     ` Andrii Nakryiko
2020-05-25 23:33       ` Andrii Nakryiko
2020-05-26  3:05         ` Paul E. McKenney
2020-05-17 19:57 ` [PATCH v2 bpf-next 3/7] bpf: track reference type in verifier Andrii Nakryiko
2020-05-22  1:13   ` Alexei Starovoitov
2020-05-22 18:53     ` Andrii Nakryiko
2020-05-17 19:57 ` [PATCH v2 bpf-next 4/7] libbpf: add BPF ring buffer support Andrii Nakryiko
2020-05-22  1:15   ` Alexei Starovoitov
2020-05-22 18:56     ` Andrii Nakryiko
2020-05-17 19:57 ` [PATCH v2 bpf-next 5/7] selftests/bpf: add BPF ringbuf selftests Andrii Nakryiko
2020-05-22  1:20   ` Alexei Starovoitov
2020-05-22 18:58     ` Andrii Nakryiko
2020-05-17 19:57 ` [PATCH v2 bpf-next 6/7] bpf: add BPF ringbuf and perf buffer benchmarks Andrii Nakryiko
2020-05-22  1:21   ` Alexei Starovoitov
2020-05-22 19:07     ` Andrii Nakryiko
2020-05-17 19:57 ` [PATCH v2 bpf-next 7/7] docs/bpf: add BPF ring buffer design notes Andrii Nakryiko
2020-05-22  1:23   ` Alexei Starovoitov
2020-05-22 19:08     ` Andrii Nakryiko
2020-05-25  9:59   ` Alban Crequy
2020-05-25 19:12     ` Andrii Nakryiko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEf4BzY3mt8puNgOwi5ZWnVbXksnsXK_beG+HhhZutyBG-BO7A@mail.gmail.com \
    --to=andrii.nakryiko@gmail.com \
    --cc=alban.crequy@gmail.com \
    --cc=alban@kinvolk.io \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jonathan.lemon@gmail.com \
    --cc=kai@kinvolk.io \
    --cc=kernel-team@fb.com \
    --cc=mauricio@kinvolk.io \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=sdf@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).