All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/1] libbpf: perfbuf custom event reader
@ 2022-07-08  6:04 Jon Doron
  2022-07-08  6:04 ` [PATCH bpf-next v2 1/1] libbpf: perfbuf: allow raw access to buffers Jon Doron
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Doron @ 2022-07-08  6:04 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel; +Cc: Jon Doron

From: Jon Doron <jond@wiz.io>

Add support for writing a custom event reader, by exposing the ring
buffer state, and allowing to set it's tail.

Few simple examples where this type of needed:
1. perf_event_read_simple is allocating using malloc, perhaps you want
   to handle the wrap-around in some other way.
2. Since perf buf is per-cpu then the order of the events is not
   guarnteed, for example:
   Given 3 events where each event has a timestamp t0 < t1 < t2,
   and the events are spread on more than 1 CPU, then we can end
   up with the following state in the ring buf:
   CPU[0] => [t0, t2]
   CPU[1] => [t1]
   When you consume the events from CPU[0], you could know there is
   a t1 missing, (assuming there are no drops, and your event data
   contains a sequential index).
   So now one can simply do the following, for CPU[0], you can store
   the address of t0 and t2 in an array (without moving the tail, so
   there data is not perished) then move on the CPU[1] and set the
   address of t1 in the same array.
   So you end up with something like:
   void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
   and move the tails as you process in order.
3. Assuming there are multiple CPUs and we want to start draining the
   messages from them, then we can "pick" with which one to start with
   according to the remaining free space in the ring buffer.

Jon Doron (1):
  libbpf: perfbuf: allow raw access to buffers

 tools/lib/bpf/libbpf.c   | 40 ++++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.h   | 25 +++++++++++++++++++++++++
 tools/lib/bpf/libbpf.map |  2 ++
 3 files changed, 67 insertions(+)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH bpf-next v2 1/1] libbpf: perfbuf: allow raw access to buffers
  2022-07-08  6:04 [PATCH bpf-next v2 0/1] libbpf: perfbuf custom event reader Jon Doron
@ 2022-07-08  6:04 ` Jon Doron
  2022-07-08 22:48   ` Andrii Nakryiko
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Doron @ 2022-07-08  6:04 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel; +Cc: Jon Doron

From: Jon Doron <jond@wiz.io>

Add support for writing a custom event reader, by exposing the ring
buffer state, and allowing to set it's tail.

Few simple examples where this type of needed:
1. perf_event_read_simple is allocating using malloc, perhaps you want
   to handle the wrap-around in some other way.
2. Since perf buf is per-cpu then the order of the events is not
   guarnteed, for example:
   Given 3 events where each event has a timestamp t0 < t1 < t2,
   and the events are spread on more than 1 CPU, then we can end
   up with the following state in the ring buf:
   CPU[0] => [t0, t2]
   CPU[1] => [t1]
   When you consume the events from CPU[0], you could know there is
   a t1 missing, (assuming there are no drops, and your event data
   contains a sequential index).
   So now one can simply do the following, for CPU[0], you can store
   the address of t0 and t2 in an array (without moving the tail, so
   there data is not perished) then move on the CPU[1] and set the
   address of t1 in the same array.
   So you end up with something like:
   void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
   and move the tails as you process in order.
3. Assuming there are multiple CPUs and we want to start draining the
   messages from them, then we can "pick" with which one to start with
   according to the remaining free space in the ring buffer.

Signed-off-by: Jon Doron <jond@wiz.io>
---
 tools/lib/bpf/libbpf.c   | 40 ++++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.h   | 25 +++++++++++++++++++++++++
 tools/lib/bpf/libbpf.map |  2 ++
 3 files changed, 67 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e89cc9c885b3..37299aa05185 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -12433,6 +12433,46 @@ static int perf_buffer__process_records(struct perf_buffer *pb,
 	return 0;
 }
 
+int perf_buffer__raw_ring_buf(const struct perf_buffer *pb, size_t buf_idx,
+			      void **base, size_t *buf_size, __u64 *head,
+			      __u64 *tail)
+{
+	struct perf_cpu_buf *cpu_buf;
+	struct perf_event_mmap_page *header;
+
+	if (buf_idx >= pb->cpu_cnt)
+		return libbpf_err(-EINVAL);
+
+	cpu_buf = pb->cpu_bufs[buf_idx];
+	if (!cpu_buf)
+		return libbpf_err(-ENOENT);
+
+	header = cpu_buf->base;
+	*head = ring_buffer_read_head(header);
+	*tail = header->data_tail;
+	*base = ((__u8 *)header) + pb->page_size;
+	*buf_size = pb->mmap_size;
+	return 0;
+}
+
+int perf_buffer__set_ring_buf_tail(const struct perf_buffer *pb, size_t buf_idx,
+				   __u64 tail)
+{
+	struct perf_cpu_buf *cpu_buf;
+	struct perf_event_mmap_page *header;
+
+	if (buf_idx >= pb->cpu_cnt)
+		return libbpf_err(-EINVAL);
+
+	cpu_buf = pb->cpu_bufs[buf_idx];
+	if (!cpu_buf)
+		return libbpf_err(-ENOENT);
+
+	header = cpu_buf->base;
+	ring_buffer_write_tail(header, tail);
+	return 0;
+}
+
 int perf_buffer__epoll_fd(const struct perf_buffer *pb)
 {
 	return pb->epoll_fd;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 9e9a3fd3edd8..035a0ce42139 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1381,6 +1381,31 @@ LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
 LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx);
 LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb);
 LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx);
+/** @brief **perf_buffer__raw_ring_buf()** gets the ring buffer information for
+ * a given CPU perf buffer.
+ * This API and **perf_buffer__set_ring_buf_tail()** allow low level access
+ * to the ring buffer in order to implement a custom ring buffer drain
+ * mechanisim.
+ *
+ * @param pb the perf_buffer instance
+ * @param buf_idx the index of the perf buffer
+ * @param base will get the base of the ring buffer mmap
+ * @param buf_size will get size of the ring buffer mmap
+ * @param head gets the ring buffer head pointer
+ * @param tail gets the ring buffer tail pointer
+ * @return 0, for success
+ */
+LIBBPF_API int perf_buffer__raw_ring_buf(const struct perf_buffer *pb,
+					 size_t buf_idx, void **base,
+					 size_t *buf_size, __u64 *head,
+					 __u64 *tail);
+/** @brief **perf_buffer__set_ring_buf_tail()** sets the ring buffer tail
+ * @param pb the perf_buffer instance
+ * @param buf_idx the index of the perf buffer
+ * @param tail sets the value up-until where messages were consumed.
+ */
+LIBBPF_API int perf_buffer__set_ring_buf_tail(const struct perf_buffer *pb,
+					      size_t buf_idx, __u64 tail);
 
 typedef enum bpf_perf_event_ret
 	(*bpf_perf_event_print_t)(struct perf_event_header *hdr,
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 52973cffc20c..22fbc97839dd 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -458,6 +458,8 @@ LIBBPF_0.8.0 {
 		bpf_program__set_insns;
 		libbpf_register_prog_handler;
 		libbpf_unregister_prog_handler;
+		perf_buffer__raw_ring_buf;
+		perf_buffer__set_ring_buf_tail;
 } LIBBPF_0.7.0;
 
 LIBBPF_1.0.0 {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/1] libbpf: perfbuf: allow raw access to buffers
  2022-07-08  6:04 ` [PATCH bpf-next v2 1/1] libbpf: perfbuf: allow raw access to buffers Jon Doron
@ 2022-07-08 22:48   ` Andrii Nakryiko
  2022-07-09  2:53     ` Jon Doron
  0 siblings, 1 reply; 9+ messages in thread
From: Andrii Nakryiko @ 2022-07-08 22:48 UTC (permalink / raw)
  To: Jon Doron
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jon Doron

On Thu, Jul 7, 2022 at 11:04 PM Jon Doron <arilou@gmail.com> wrote:
>
> From: Jon Doron <jond@wiz.io>
>
> Add support for writing a custom event reader, by exposing the ring
> buffer state, and allowing to set it's tail.
>
> Few simple examples where this type of needed:
> 1. perf_event_read_simple is allocating using malloc, perhaps you want
>    to handle the wrap-around in some other way.
> 2. Since perf buf is per-cpu then the order of the events is not
>    guarnteed, for example:
>    Given 3 events where each event has a timestamp t0 < t1 < t2,
>    and the events are spread on more than 1 CPU, then we can end
>    up with the following state in the ring buf:
>    CPU[0] => [t0, t2]
>    CPU[1] => [t1]
>    When you consume the events from CPU[0], you could know there is
>    a t1 missing, (assuming there are no drops, and your event data
>    contains a sequential index).
>    So now one can simply do the following, for CPU[0], you can store
>    the address of t0 and t2 in an array (without moving the tail, so
>    there data is not perished) then move on the CPU[1] and set the
>    address of t1 in the same array.
>    So you end up with something like:
>    void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
>    and move the tails as you process in order.
> 3. Assuming there are multiple CPUs and we want to start draining the
>    messages from them, then we can "pick" with which one to start with
>    according to the remaining free space in the ring buffer.
>

All the above use cases are sufficiently advanced that you as such an
advanced user should be able to write your own perfbuf consumer code.
There isn't a lot of code to set everything up, but then you get full
control over all the details.

I don't see this API as a generally useful, it feels way too low-level
and special for inclusion in libbpf.

> Signed-off-by: Jon Doron <jond@wiz.io>
> ---
>  tools/lib/bpf/libbpf.c   | 40 ++++++++++++++++++++++++++++++++++++++++
>  tools/lib/bpf/libbpf.h   | 25 +++++++++++++++++++++++++
>  tools/lib/bpf/libbpf.map |  2 ++
>  3 files changed, 67 insertions(+)
>

[...]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/1] libbpf: perfbuf: allow raw access to buffers
  2022-07-08 22:48   ` Andrii Nakryiko
@ 2022-07-09  2:53     ` Jon Doron
  2022-07-10  5:23       ` Alexei Starovoitov
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Doron @ 2022-07-09  2:53 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Jon Doron

On 08/07/2022, Andrii Nakryiko wrote:
>On Thu, Jul 7, 2022 at 11:04 PM Jon Doron <arilou@gmail.com> wrote:
>>
>> From: Jon Doron <jond@wiz.io>
>>
>> Add support for writing a custom event reader, by exposing the ring
>> buffer state, and allowing to set it's tail.
>>
>> Few simple examples where this type of needed:
>> 1. perf_event_read_simple is allocating using malloc, perhaps you want
>>    to handle the wrap-around in some other way.
>> 2. Since perf buf is per-cpu then the order of the events is not
>>    guarnteed, for example:
>>    Given 3 events where each event has a timestamp t0 < t1 < t2,
>>    and the events are spread on more than 1 CPU, then we can end
>>    up with the following state in the ring buf:
>>    CPU[0] => [t0, t2]
>>    CPU[1] => [t1]
>>    When you consume the events from CPU[0], you could know there is
>>    a t1 missing, (assuming there are no drops, and your event data
>>    contains a sequential index).
>>    So now one can simply do the following, for CPU[0], you can store
>>    the address of t0 and t2 in an array (without moving the tail, so
>>    there data is not perished) then move on the CPU[1] and set the
>>    address of t1 in the same array.
>>    So you end up with something like:
>>    void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
>>    and move the tails as you process in order.
>> 3. Assuming there are multiple CPUs and we want to start draining the
>>    messages from them, then we can "pick" with which one to start with
>>    according to the remaining free space in the ring buffer.
>>
>
>All the above use cases are sufficiently advanced that you as such an
>advanced user should be able to write your own perfbuf consumer code.
>There isn't a lot of code to set everything up, but then you get full
>control over all the details.
>
>I don't see this API as a generally useful, it feels way too low-level
>and special for inclusion in libbpf.
>

Hi Andrii,

I understand, but I was still hoping you will be willing to expose this 
API.
libbpf has very simple and nice binding to Rust and other languages, 
implementing one of those use cases in the bindings can make things much 
simpler than using some libc or syscall APIs, instead of enjoying all
the simplicity that you get for free in libbpf.

Hope you will be willing to reconsider :)

Have a nice weekend,
-- Jon.

>> Signed-off-by: Jon Doron <jond@wiz.io>
>> ---
>>  tools/lib/bpf/libbpf.c   | 40 ++++++++++++++++++++++++++++++++++++++++
>>  tools/lib/bpf/libbpf.h   | 25 +++++++++++++++++++++++++
>>  tools/lib/bpf/libbpf.map |  2 ++
>>  3 files changed, 67 insertions(+)
>>
>
>[...]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/1] libbpf: perfbuf: allow raw access to buffers
  2022-07-09  2:53     ` Jon Doron
@ 2022-07-10  5:23       ` Alexei Starovoitov
       [not found]         ` <CAP7QCohvoDZ0sk6sqA3112xsM4xzUc9uRHXiDNyt7M4jc3oUmg@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2022-07-10  5:23 UTC (permalink / raw)
  To: Jon Doron
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jon Doron

On Fri, Jul 8, 2022 at 7:54 PM Jon Doron <arilou@gmail.com> wrote:
>
> On 08/07/2022, Andrii Nakryiko wrote:
> >On Thu, Jul 7, 2022 at 11:04 PM Jon Doron <arilou@gmail.com> wrote:
> >>
> >> From: Jon Doron <jond@wiz.io>
> >>
> >> Add support for writing a custom event reader, by exposing the ring
> >> buffer state, and allowing to set it's tail.
> >>
> >> Few simple examples where this type of needed:
> >> 1. perf_event_read_simple is allocating using malloc, perhaps you want
> >>    to handle the wrap-around in some other way.
> >> 2. Since perf buf is per-cpu then the order of the events is not
> >>    guarnteed, for example:
> >>    Given 3 events where each event has a timestamp t0 < t1 < t2,
> >>    and the events are spread on more than 1 CPU, then we can end
> >>    up with the following state in the ring buf:
> >>    CPU[0] => [t0, t2]
> >>    CPU[1] => [t1]
> >>    When you consume the events from CPU[0], you could know there is
> >>    a t1 missing, (assuming there are no drops, and your event data
> >>    contains a sequential index).
> >>    So now one can simply do the following, for CPU[0], you can store
> >>    the address of t0 and t2 in an array (without moving the tail, so
> >>    there data is not perished) then move on the CPU[1] and set the
> >>    address of t1 in the same array.
> >>    So you end up with something like:
> >>    void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
> >>    and move the tails as you process in order.
> >> 3. Assuming there are multiple CPUs and we want to start draining the
> >>    messages from them, then we can "pick" with which one to start with
> >>    according to the remaining free space in the ring buffer.
> >>
> >
> >All the above use cases are sufficiently advanced that you as such an
> >advanced user should be able to write your own perfbuf consumer code.
> >There isn't a lot of code to set everything up, but then you get full
> >control over all the details.
> >
> >I don't see this API as a generally useful, it feels way too low-level
> >and special for inclusion in libbpf.
> >
>
> Hi Andrii,
>
> I understand, but I was still hoping you will be willing to expose this
> API.
> libbpf has very simple and nice binding to Rust and other languages,
> implementing one of those use cases in the bindings can make things much
> simpler than using some libc or syscall APIs, instead of enjoying all
> the simplicity that you get for free in libbpf.
>
> Hope you will be willing to reconsider :)

The discussion would have been different if you mentioned that
motivation in the commit logs.
Please provide links to "Rust and other languages" code that
uses this api.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/1] libbpf: perfbuf: allow raw access to buffers
       [not found]         ` <CAP7QCohvoDZ0sk6sqA3112xsM4xzUc9uRHXiDNyt7M4jc3oUmg@mail.gmail.com>
@ 2022-07-10 15:15           ` Alexei Starovoitov
       [not found]             ` <CAP7QCogdYrsfGvEvhg5R8rQvWDe=o-nxgmqubZtfucH1zNc-RA@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2022-07-10 15:15 UTC (permalink / raw)
  To: Jon Doron
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jon Doron

On Sat, Jul 9, 2022 at 10:43 PM Jon Doron <arilou@gmail.com> wrote:
>
> I was referring to the following:
> https://github.com/libbpf/libbpf-rs/blob/master/libbpf-rs/src/perf_buffer.rs

How does your patch help libbpf-rs?

Please don't top post.

> Thanks,
> -- Jon.
>
> On Sun, Jul 10, 2022, 08:23 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>
>> On Fri, Jul 8, 2022 at 7:54 PM Jon Doron <arilou@gmail.com> wrote:
>> >
>> > On 08/07/2022, Andrii Nakryiko wrote:
>> > >On Thu, Jul 7, 2022 at 11:04 PM Jon Doron <arilou@gmail.com> wrote:
>> > >>
>> > >> From: Jon Doron <jond@wiz.io>
>> > >>
>> > >> Add support for writing a custom event reader, by exposing the ring
>> > >> buffer state, and allowing to set it's tail.
>> > >>
>> > >> Few simple examples where this type of needed:
>> > >> 1. perf_event_read_simple is allocating using malloc, perhaps you want
>> > >>    to handle the wrap-around in some other way.
>> > >> 2. Since perf buf is per-cpu then the order of the events is not
>> > >>    guarnteed, for example:
>> > >>    Given 3 events where each event has a timestamp t0 < t1 < t2,
>> > >>    and the events are spread on more than 1 CPU, then we can end
>> > >>    up with the following state in the ring buf:
>> > >>    CPU[0] => [t0, t2]
>> > >>    CPU[1] => [t1]
>> > >>    When you consume the events from CPU[0], you could know there is
>> > >>    a t1 missing, (assuming there are no drops, and your event data
>> > >>    contains a sequential index).
>> > >>    So now one can simply do the following, for CPU[0], you can store
>> > >>    the address of t0 and t2 in an array (without moving the tail, so
>> > >>    there data is not perished) then move on the CPU[1] and set the
>> > >>    address of t1 in the same array.
>> > >>    So you end up with something like:
>> > >>    void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
>> > >>    and move the tails as you process in order.
>> > >> 3. Assuming there are multiple CPUs and we want to start draining the
>> > >>    messages from them, then we can "pick" with which one to start with
>> > >>    according to the remaining free space in the ring buffer.
>> > >>
>> > >
>> > >All the above use cases are sufficiently advanced that you as such an
>> > >advanced user should be able to write your own perfbuf consumer code.
>> > >There isn't a lot of code to set everything up, but then you get full
>> > >control over all the details.
>> > >
>> > >I don't see this API as a generally useful, it feels way too low-level
>> > >and special for inclusion in libbpf.
>> > >
>> >
>> > Hi Andrii,
>> >
>> > I understand, but I was still hoping you will be willing to expose this
>> > API.
>> > libbpf has very simple and nice binding to Rust and other languages,
>> > implementing one of those use cases in the bindings can make things much
>> > simpler than using some libc or syscall APIs, instead of enjoying all
>> > the simplicity that you get for free in libbpf.
>> >
>> > Hope you will be willing to reconsider :)
>>
>> The discussion would have been different if you mentioned that
>> motivation in the commit logs.
>> Please provide links to "Rust and other languages" code that
>> uses this api.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/1] libbpf: perfbuf: allow raw access to buffers
       [not found]             ` <CAP7QCogdYrsfGvEvhg5R8rQvWDe=o-nxgmqubZtfucH1zNc-RA@mail.gmail.com>
@ 2022-07-12  4:01               ` Andrii Nakryiko
       [not found]                 ` <CAP7QCog2hiDsb1Z-hNNsUPTja3hXfNa-auv1yrwb0YWWrymWow@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Andrii Nakryiko @ 2022-07-12  4:01 UTC (permalink / raw)
  To: Jon Doron
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jon Doron

On Sun, Jul 10, 2022 at 10:07 AM Jon Doron <arilou@gmail.com> wrote:
>
>
> On Sun, Jul 10, 2022, 18:16 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>
>> On Sat, Jul 9, 2022 at 10:43 PM Jon Doron <arilou@gmail.com> wrote:
>> >
>> > I was referring to the following:
>> > https://github.com/libbpf/libbpf-rs/blob/master/libbpf-rs/src/perf_buffer.rs
>>
>> How does your patch help libbpf-rs?
>>
>> Please don't top post.
>
>
> You will be able to implement a custom perf buffer consumer, as it already has good bindings with libbpf-sys which is built from the C headers
>
> Sorry for the top posting I'm not home and replying from my phone
>

I can see us exposing per-CPU buffers for (very) advanced users, something like:

int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void
**buf, size_t buf_sz);

Then in combination with perf_buffer__buffer_fd() you can implement
your own polling and processing. So you just use libbpf logic to setup
buffers, but then don't call perf_buffer__poll() at all and read
records and update tail on your own.

But this combination of perf_buffer__raw_ring_buf() and
perf_buffer__set_ring_buf_tail() seems like a bad API, sorry.


>>
>> > Thanks,
>> > -- Jon.
>> >
>> > On Sun, Jul 10, 2022, 08:23 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>> >>
>> >> On Fri, Jul 8, 2022 at 7:54 PM Jon Doron <arilou@gmail.com> wrote:
>> >> >
>> >> > On 08/07/2022, Andrii Nakryiko wrote:
>> >> > >On Thu, Jul 7, 2022 at 11:04 PM Jon Doron <arilou@gmail.com> wrote:
>> >> > >>
>> >> > >> From: Jon Doron <jond@wiz.io>
>> >> > >>
>> >> > >> Add support for writing a custom event reader, by exposing the ring
>> >> > >> buffer state, and allowing to set it's tail.
>> >> > >>
>> >> > >> Few simple examples where this type of needed:
>> >> > >> 1. perf_event_read_simple is allocating using malloc, perhaps you want
>> >> > >>    to handle the wrap-around in some other way.
>> >> > >> 2. Since perf buf is per-cpu then the order of the events is not
>> >> > >>    guarnteed, for example:
>> >> > >>    Given 3 events where each event has a timestamp t0 < t1 < t2,
>> >> > >>    and the events are spread on more than 1 CPU, then we can end
>> >> > >>    up with the following state in the ring buf:
>> >> > >>    CPU[0] => [t0, t2]
>> >> > >>    CPU[1] => [t1]
>> >> > >>    When you consume the events from CPU[0], you could know there is
>> >> > >>    a t1 missing, (assuming there are no drops, and your event data
>> >> > >>    contains a sequential index).
>> >> > >>    So now one can simply do the following, for CPU[0], you can store
>> >> > >>    the address of t0 and t2 in an array (without moving the tail, so
>> >> > >>    there data is not perished) then move on the CPU[1] and set the
>> >> > >>    address of t1 in the same array.
>> >> > >>    So you end up with something like:
>> >> > >>    void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
>> >> > >>    and move the tails as you process in order.
>> >> > >> 3. Assuming there are multiple CPUs and we want to start draining the
>> >> > >>    messages from them, then we can "pick" with which one to start with
>> >> > >>    according to the remaining free space in the ring buffer.
>> >> > >>
>> >> > >
>> >> > >All the above use cases are sufficiently advanced that you as such an
>> >> > >advanced user should be able to write your own perfbuf consumer code.
>> >> > >There isn't a lot of code to set everything up, but then you get full
>> >> > >control over all the details.
>> >> > >
>> >> > >I don't see this API as a generally useful, it feels way too low-level
>> >> > >and special for inclusion in libbpf.
>> >> > >
>> >> >
>> >> > Hi Andrii,
>> >> >
>> >> > I understand, but I was still hoping you will be willing to expose this
>> >> > API.
>> >> > libbpf has very simple and nice binding to Rust and other languages,
>> >> > implementing one of those use cases in the bindings can make things much
>> >> > simpler than using some libc or syscall APIs, instead of enjoying all
>> >> > the simplicity that you get for free in libbpf.
>> >> >
>> >> > Hope you will be willing to reconsider :)
>> >>
>> >> The discussion would have been different if you mentioned that
>> >> motivation in the commit logs.
>> >> Please provide links to "Rust and other languages" code that
>> >> uses this api.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/1] libbpf: perfbuf: allow raw access to buffers
       [not found]                 ` <CAP7QCog2hiDsb1Z-hNNsUPTja3hXfNa-auv1yrwb0YWWrymWow@mail.gmail.com>
@ 2022-07-12  4:50                   ` Andrii Nakryiko
       [not found]                     ` <CAP7QCojcQ0kGMDO2BZH+zea7tKiWQqx_qo0KZ028hfX_2WLs9A@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Andrii Nakryiko @ 2022-07-12  4:50 UTC (permalink / raw)
  To: Jon Doron
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jon Doron

On Mon, Jul 11, 2022 at 9:19 PM Jon Doron <arilou@gmail.com> wrote:
>
>
>
> On Tue, Jul 12, 2022, 07:01 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>
>> On Sun, Jul 10, 2022 at 10:07 AM Jon Doron <arilou@gmail.com> wrote:
>> >
>> >
>> > On Sun, Jul 10, 2022, 18:16 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>> >>
>> >> On Sat, Jul 9, 2022 at 10:43 PM Jon Doron <arilou@gmail.com> wrote:
>> >> >
>> >> > I was referring to the following:
>> >> > https://github.com/libbpf/libbpf-rs/blob/master/libbpf-rs/src/perf_buffer.rs
>> >>
>> >> How does your patch help libbpf-rs?
>> >>
>> >> Please don't top post.
>> >
>> >
>> > You will be able to implement a custom perf buffer consumer, as it already has good bindings with libbpf-sys which is built from the C headers
>> >
>> > Sorry for the top posting I'm not home and replying from my phone
>> >
>>
>> I can see us exposing per-CPU buffers for (very) advanced users, something like:
>>
>> int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void
>> **buf, size_t buf_sz);
>
>
> Not sure I'm fully following what this API does, you will get a pointer to a message in the ring buffer?
> If so how do you consume without setting up a new tail?
>
> Or do you get a full copy of the current ring buffer (because that will mean you would have to alloc and copy which might hurt performance), but in that case you no longer a set tail or drain function.

No, it returns a pointer to mmap()'ed per-CPU buffer memory, including
its header page which contains head/tail positions. As I said, it's
for an advanced user, you need to know the layout and how to consume
data.

>
> Also perhaps regardless if this patchset will be approved or not it would probably be nice to have something like
> int perf_buffer__state(perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, size_t *free_space, size_t *used_space);
>
> Cheers,
> --Jon.
>
>>
>> Then in combination with perf_buffer__buffer_fd() you can implement
>> your own polling and processing. So you just use libbpf logic to setup
>> buffers, but then don't call perf_buffer__poll() at all and read
>> records and update tail on your own.
>>
>> But this combination of perf_buffer__raw_ring_buf() and
>> perf_buffer__set_ring_buf_tail() seems like a bad API, sorry.
>>
>>
>> >>
>> >> > Thanks,
>> >> > -- Jon.
>> >> >
>> >> > On Sun, Jul 10, 2022, 08:23 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>> >> >>
>> >> >> On Fri, Jul 8, 2022 at 7:54 PM Jon Doron <arilou@gmail.com> wrote:
>> >> >> >
>> >> >> > On 08/07/2022, Andrii Nakryiko wrote:
>> >> >> > >On Thu, Jul 7, 2022 at 11:04 PM Jon Doron <arilou@gmail.com> wrote:
>> >> >> > >>
>> >> >> > >> From: Jon Doron <jond@wiz.io>
>> >> >> > >>
>> >> >> > >> Add support for writing a custom event reader, by exposing the ring
>> >> >> > >> buffer state, and allowing to set it's tail.
>> >> >> > >>
>> >> >> > >> Few simple examples where this type of needed:
>> >> >> > >> 1. perf_event_read_simple is allocating using malloc, perhaps you want
>> >> >> > >>    to handle the wrap-around in some other way.
>> >> >> > >> 2. Since perf buf is per-cpu then the order of the events is not
>> >> >> > >>    guarnteed, for example:
>> >> >> > >>    Given 3 events where each event has a timestamp t0 < t1 < t2,
>> >> >> > >>    and the events are spread on more than 1 CPU, then we can end
>> >> >> > >>    up with the following state in the ring buf:
>> >> >> > >>    CPU[0] => [t0, t2]
>> >> >> > >>    CPU[1] => [t1]
>> >> >> > >>    When you consume the events from CPU[0], you could know there is
>> >> >> > >>    a t1 missing, (assuming there are no drops, and your event data
>> >> >> > >>    contains a sequential index).
>> >> >> > >>    So now one can simply do the following, for CPU[0], you can store
>> >> >> > >>    the address of t0 and t2 in an array (without moving the tail, so
>> >> >> > >>    there data is not perished) then move on the CPU[1] and set the
>> >> >> > >>    address of t1 in the same array.
>> >> >> > >>    So you end up with something like:
>> >> >> > >>    void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
>> >> >> > >>    and move the tails as you process in order.
>> >> >> > >> 3. Assuming there are multiple CPUs and we want to start draining the
>> >> >> > >>    messages from them, then we can "pick" with which one to start with
>> >> >> > >>    according to the remaining free space in the ring buffer.
>> >> >> > >>
>> >> >> > >
>> >> >> > >All the above use cases are sufficiently advanced that you as such an
>> >> >> > >advanced user should be able to write your own perfbuf consumer code.
>> >> >> > >There isn't a lot of code to set everything up, but then you get full
>> >> >> > >control over all the details.
>> >> >> > >
>> >> >> > >I don't see this API as a generally useful, it feels way too low-level
>> >> >> > >and special for inclusion in libbpf.
>> >> >> > >
>> >> >> >
>> >> >> > Hi Andrii,
>> >> >> >
>> >> >> > I understand, but I was still hoping you will be willing to expose this
>> >> >> > API.
>> >> >> > libbpf has very simple and nice binding to Rust and other languages,
>> >> >> > implementing one of those use cases in the bindings can make things much
>> >> >> > simpler than using some libc or syscall APIs, instead of enjoying all
>> >> >> > the simplicity that you get for free in libbpf.
>> >> >> >
>> >> >> > Hope you will be willing to reconsider :)
>> >> >>
>> >> >> The discussion would have been different if you mentioned that
>> >> >> motivation in the commit logs.
>> >> >> Please provide links to "Rust and other languages" code that
>> >> >> uses this api.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/1] libbpf: perfbuf: allow raw access to buffers
       [not found]                     ` <CAP7QCojcQ0kGMDO2BZH+zea7tKiWQqx_qo0KZ028hfX_2WLs9A@mail.gmail.com>
@ 2022-07-12 21:03                       ` Andrii Nakryiko
  0 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2022-07-12 21:03 UTC (permalink / raw)
  To: Jon Doron
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jon Doron

On Mon, Jul 11, 2022 at 10:47 PM Jon Doron <arilou@gmail.com> wrote:
>
>
>
> On Tue, Jul 12, 2022, 07:50 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>
>> On Mon, Jul 11, 2022 at 9:19 PM Jon Doron <arilou@gmail.com> wrote:
>> >
>> >
>> >
>> > On Tue, Jul 12, 2022, 07:01 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>> >>
>> >> On Sun, Jul 10, 2022 at 10:07 AM Jon Doron <arilou@gmail.com> wrote:
>> >> >
>> >> >
>> >> > On Sun, Jul 10, 2022, 18:16 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>> >> >>
>> >> >> On Sat, Jul 9, 2022 at 10:43 PM Jon Doron <arilou@gmail.com> wrote:
>> >> >> >
>> >> >> > I was referring to the following:
>> >> >> > https://github.com/libbpf/libbpf-rs/blob/master/libbpf-rs/src/perf_buffer.rs
>> >> >>
>> >> >> How does your patch help libbpf-rs?
>> >> >>
>> >> >> Please don't top post.
>> >> >
>> >> >
>> >> > You will be able to implement a custom perf buffer consumer, as it already has good bindings with libbpf-sys which is built from the C headers
>> >> >
>> >> > Sorry for the top posting I'm not home and replying from my phone
>> >> >
>> >>
>> >> I can see us exposing per-CPU buffers for (very) advanced users, something like:
>> >>
>> >> int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void
>> >> **buf, size_t buf_sz);
>> >
>> >
>> > Not sure I'm fully following what this API does, you will get a pointer to a message in the ring buffer?
>> > If so how do you consume without setting up a new tail?
>> >
>> > Or do you get a full copy of the current ring buffer (because that will mean you would have to alloc and copy which might hurt performance), but in that case you no longer a set tail or drain function.
>>
>> No, it returns a pointer to mmap()'ed per-CPU buffer memory, including
>> its header page which contains head/tail positions. As I said, it's
>> for an advanced user, you need to know the layout and how to consume
>> data.
>
>
> Oh I see well that sounds perfect to me, do you want me to send a patch? (I'm currently on vacation abroad so I don't have access to my PC but I can get to it later this week or during the weekend

Yes, please do when you get a chance.

>
>>
>> >
>> > Also perhaps regardless if this patchset will be approved or not it would probably be nice to have something like
>> > int perf_buffer__state(perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, size_t *free_space, size_t *used_space);
>> >
>> > Cheers,
>> > --Jon.
>> >
>> >>
>> >> Then in combination with perf_buffer__buffer_fd() you can implement
>> >> your own polling and processing. So you just use libbpf logic to setup
>> >> buffers, but then don't call perf_buffer__poll() at all and read
>> >> records and update tail on your own.
>> >>
>> >> But this combination of perf_buffer__raw_ring_buf() and
>> >> perf_buffer__set_ring_buf_tail() seems like a bad API, sorry.
>> >>
>> >>
>> >> >>
>> >> >> > Thanks,
>> >> >> > -- Jon.
>> >> >> >
>> >> >> > On Sun, Jul 10, 2022, 08:23 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>> >> >> >>
>> >> >> >> On Fri, Jul 8, 2022 at 7:54 PM Jon Doron <arilou@gmail.com> wrote:
>> >> >> >> >
>> >> >> >> > On 08/07/2022, Andrii Nakryiko wrote:
>> >> >> >> > >On Thu, Jul 7, 2022 at 11:04 PM Jon Doron <arilou@gmail.com> wrote:
>> >> >> >> > >>
>> >> >> >> > >> From: Jon Doron <jond@wiz.io>
>> >> >> >> > >>
>> >> >> >> > >> Add support for writing a custom event reader, by exposing the ring
>> >> >> >> > >> buffer state, and allowing to set it's tail.
>> >> >> >> > >>
>> >> >> >> > >> Few simple examples where this type of needed:
>> >> >> >> > >> 1. perf_event_read_simple is allocating using malloc, perhaps you want
>> >> >> >> > >>    to handle the wrap-around in some other way.
>> >> >> >> > >> 2. Since perf buf is per-cpu then the order of the events is not
>> >> >> >> > >>    guarnteed, for example:
>> >> >> >> > >>    Given 3 events where each event has a timestamp t0 < t1 < t2,
>> >> >> >> > >>    and the events are spread on more than 1 CPU, then we can end
>> >> >> >> > >>    up with the following state in the ring buf:
>> >> >> >> > >>    CPU[0] => [t0, t2]
>> >> >> >> > >>    CPU[1] => [t1]
>> >> >> >> > >>    When you consume the events from CPU[0], you could know there is
>> >> >> >> > >>    a t1 missing, (assuming there are no drops, and your event data
>> >> >> >> > >>    contains a sequential index).
>> >> >> >> > >>    So now one can simply do the following, for CPU[0], you can store
>> >> >> >> > >>    the address of t0 and t2 in an array (without moving the tail, so
>> >> >> >> > >>    there data is not perished) then move on the CPU[1] and set the
>> >> >> >> > >>    address of t1 in the same array.
>> >> >> >> > >>    So you end up with something like:
>> >> >> >> > >>    void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
>> >> >> >> > >>    and move the tails as you process in order.
>> >> >> >> > >> 3. Assuming there are multiple CPUs and we want to start draining the
>> >> >> >> > >>    messages from them, then we can "pick" with which one to start with
>> >> >> >> > >>    according to the remaining free space in the ring buffer.
>> >> >> >> > >>
>> >> >> >> > >
>> >> >> >> > >All the above use cases are sufficiently advanced that you as such an
>> >> >> >> > >advanced user should be able to write your own perfbuf consumer code.
>> >> >> >> > >There isn't a lot of code to set everything up, but then you get full
>> >> >> >> > >control over all the details.
>> >> >> >> > >
>> >> >> >> > >I don't see this API as a generally useful, it feels way too low-level
>> >> >> >> > >and special for inclusion in libbpf.
>> >> >> >> > >
>> >> >> >> >
>> >> >> >> > Hi Andrii,
>> >> >> >> >
>> >> >> >> > I understand, but I was still hoping you will be willing to expose this
>> >> >> >> > API.
>> >> >> >> > libbpf has very simple and nice binding to Rust and other languages,
>> >> >> >> > implementing one of those use cases in the bindings can make things much
>> >> >> >> > simpler than using some libc or syscall APIs, instead of enjoying all
>> >> >> >> > the simplicity that you get for free in libbpf.
>> >> >> >> >
>> >> >> >> > Hope you will be willing to reconsider :)
>> >> >> >>
>> >> >> >> The discussion would have been different if you mentioned that
>> >> >> >> motivation in the commit logs.
>> >> >> >> Please provide links to "Rust and other languages" code that
>> >> >> >> uses this api.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-07-12 21:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-08  6:04 [PATCH bpf-next v2 0/1] libbpf: perfbuf custom event reader Jon Doron
2022-07-08  6:04 ` [PATCH bpf-next v2 1/1] libbpf: perfbuf: allow raw access to buffers Jon Doron
2022-07-08 22:48   ` Andrii Nakryiko
2022-07-09  2:53     ` Jon Doron
2022-07-10  5:23       ` Alexei Starovoitov
     [not found]         ` <CAP7QCohvoDZ0sk6sqA3112xsM4xzUc9uRHXiDNyt7M4jc3oUmg@mail.gmail.com>
2022-07-10 15:15           ` Alexei Starovoitov
     [not found]             ` <CAP7QCogdYrsfGvEvhg5R8rQvWDe=o-nxgmqubZtfucH1zNc-RA@mail.gmail.com>
2022-07-12  4:01               ` Andrii Nakryiko
     [not found]                 ` <CAP7QCog2hiDsb1Z-hNNsUPTja3hXfNa-auv1yrwb0YWWrymWow@mail.gmail.com>
2022-07-12  4:50                   ` Andrii Nakryiko
     [not found]                     ` <CAP7QCojcQ0kGMDO2BZH+zea7tKiWQqx_qo0KZ028hfX_2WLs9A@mail.gmail.com>
2022-07-12 21:03                       ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.