All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Kevin Sheldrake <Kevin.Sheldrake@microsoft.com>
Cc: "bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: Maximum size of record over perf ring buffer?
Date: Sun, 19 Jul 2020 21:35:19 -0700	[thread overview]
Message-ID: <CAEf4BzbE5+V8GJJwASgJJyCdX3P41GeoK14szprZq4i_OrQFOg@mail.gmail.com> (raw)
In-Reply-To: <AM5PR83MB02104FB714E7E29DD90D8E06FB7C0@AM5PR83MB0210.EURPRD83.prod.outlook.com>

On Fri, Jul 17, 2020 at 7:24 AM Kevin Sheldrake
<Kevin.Sheldrake@microsoft.com> wrote:
>
> Hello
>
> I'm building a tool using EBPF/libbpf/C and I've run into an issue that I'd like to ask about.  I haven't managed to find documentation for the maximum size of a record that can be sent over the perf ring buffer, but experimentation (on kernel 5.3 (x64) with latest libbpf from github) suggests it is just short of 64KB.  Please could someone confirm if that's the case or not?  My experiments suggest that sending a record that is greater than 64KB results in the size reported in the callback being correct but the records overlapping, causing corruption if they are not serviced as quickly as they arrive.  Setting the record to exactly 64KB results in no records being received at all.
>
> For reference, I'm using perf_buffer__new() and perf_buffer__poll() on the userland side; and bpf_perf_event_output(ctx, &event_map, BPF_F_CURRENT_CPU, event, sizeof(event_s)) on the EBPF side.
>
> Additionally, is there a better architecture for sending large volumes of data (>64KB) back from the EBPF program to userland, such as a different ring buffer, a map, some kind of shared mmaped segment, etc, other than simply fragmenting the data?  Please excuse my naivety as I'm relatively new to the world of EBPF.
>

I'm not aware of any such limitations for perf ring buffer and I
haven't had a chance to validate this. It would be great if you can
provide a small repro so that someone can take a deeper look, it does
sound like a bug, if you really get clobbered data. It might be
actually how you set up perfbuf, AFAIK, it has a mode where it will
override the data, if it's not consumed quickly enough, but you need
to consciously enable that mode.

But apart from that, shameless plug here, you can try the new BPF ring
buffer ([0]), available in 5.8+ kernels. It will allow you to avoid
extra copy of data you get with bpf_perf_event_output(), if you use
BPF ringbuf's bpf_ringbuf_reserve() + bpf_ringbuf_commit() API. It
also has bpf_ringbuf_output() API, which is logically  equivalent to
bpf_perf_event_output(). And it has a very high limit on sample size,
up to 512MB per sample.

Keep in mind, BPF ringbuf is MPSC design and if you use just one BPF
ringbuf across all CPUs, you might run into some contention across
multiple CPU. It is acceptable in a lot of applications I was
targeting, but if you have a high frequency of events (keep in mind,
throughput doesn't matter, only contention on sample reservation
matters), you might want to use an array of BPF ringbufs to scale
throughput. You can do 1 ringbuf per each CPU for ultimate performance
at the expense of memory usage (that's perf ring buffer setup), but
BPF ringbuf is flexible enough to allow any topology that makes sense
for you use case, from 1 shared ringbuf across all CPUs, to anything
in between.


  [0] https://patchwork.ozlabs.org/project/netdev/list/?series=180119&state=*

> Thank you in anticipation
>
> Kevin Sheldrake
>

  reply	other threads:[~2020-07-20  4:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-17 14:23 Maximum size of record over perf ring buffer? Kevin Sheldrake
2020-07-20  4:35 ` Andrii Nakryiko [this message]
2020-07-20 11:39   ` [EXTERNAL] " Kevin Sheldrake
2020-07-23 19:05     ` Andrii Nakryiko
2020-07-24  9:40       ` Kevin Sheldrake
2020-10-26 17:10         ` Kevin Sheldrake
2020-10-26 22:01           ` Andrii Nakryiko
2020-10-27  1:07             ` KP Singh
2020-10-27  3:43               ` Andrii Nakryiko
2020-10-28 19:03                 ` Kevin Sheldrake
2020-10-29 22:48                   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEf4BzbE5+V8GJJwASgJJyCdX3P41GeoK14szprZq4i_OrQFOg@mail.gmail.com \
    --to=andrii.nakryiko@gmail.com \
    --cc=Kevin.Sheldrake@microsoft.com \
    --cc=bpf@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.