BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Kris Van Hees <kris.van.hees@oracle.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Kris Van Hees <kris.van.hees@oracle.com>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	dtrace-devel@oss.oracle.com, linux-kernel@vger.kernel.org,
	rostedt@goodmis.org, mhiramat@kernel.org, acme@kernel.org,
	ast@kernel.org, daniel@iogearbox.net, peterz@infradead.org
Subject: Re: [RFC PATCH 00/11] bpf, trace, dtrace: DTrace BPF program type implementation and sample use
Date: Tue, 21 May 2019 14:41:37 -0400
Message-ID: <20190521184137.GH2422@oracle.com> (raw)
In-Reply-To: <20190521175617.ipry6ue7o24a2e6n@ast-mbp.dhcp.thefacebook.com>

On Tue, May 21, 2019 at 10:56:18AM -0700, Alexei Starovoitov wrote:
> On Mon, May 20, 2019 at 11:47:00PM +0000, Kris Van Hees wrote:
> > 
> >     2. bpf: add BPF_PROG_TYPE_DTRACE
> > 
> > 	This patch adds BPF_PROG_TYPE_DTRACE as a new BPF program type, without
> > 	actually providing an implementation.  The actual implementation is
> > 	added in patch 4 (see below).  We do it this way because the
> > 	implementation is being added to the tracing subsystem as a component
> > 	that I would be happy to maintain (if merged) whereas the declaration
> > 	of the program type must be in the bpf subsystem.  Since the two
> > 	subsystems are maintained by different people, we split the
> > 	implementing patches across maintainer boundaries while ensuring that
> > 	the kernel remains buildable between patches.
> 
> None of these kernel patches are necessary for what you want to achieve.

I disagree.  The current support for BPF programs for probes associates a
specific BPF program type with a specific set of probes, which means that I
cannot write BPF programs based on a more general concept of a 'DTrace probe'
and provide functionality based on that.  It also means that if I have a D
clause (DTrace probe action code associated with probes) that is to be executed
for a list of probes of different types, I need to duplicate the program
because I cannot cross program type boundaries.

By implementing a program type for DTrace, and making it possible for
tail-calls to be made from various probe-specific program types to the DTrace
program type, I can accomplish what I described above.  More details are in
the cover letter and the commit messages of the individual patches.

The reasons for these patches is because I cannot do the same with the existing
implementation.  Yes, I can do some of it or use some workarounds to accomplish
kind of the same thing, but at the expense of not being able to do what I need
to do but rather do some kind of best effort alternative.  That is not the goal
here.

> Feel free to add tools/dtrace/ directory and maintain it though.

Thank you.

> The new dtrace_buffer doesn't need to replicate existing bpf+kernel functionality
> and no changes are necessary in kernel/events/ring_buffer.c either.
> tools/dtrace/ user space component can use either per-cpu array map
> or hash map as a buffer to store arbitrary data into and use
> existing bpf_perf_event_output() to send it to user space via perf ring buffer.
> 
> See, for example, how bpftrace does that.

When using bpf_perf_event_output() you need to construct the sample first,
and then send it off to user space using the perf ring-buffer.  That is extra
work that is unnecessary.  Also, storing arbitrary data from userspace in maps
is not relevant here because this is about data that is generated at the level
of the kernel and sent to userspace as part of the probe action that is
executed when the probe fires.

Bpftrace indeed uses maps and ways to construct the sample and then uses the
perf ring-buffer to pass data to userspace.  And that is not the way DTrace
works and that is not the mechanism that we need here,  So, while this may be
satisfactory for bpftrace, it is not for DTrace.  We need more fine-grained
control over how we write data to the buffer (doing direct stores from BPF
code) and without the overhead of constructing a complete sample that can just
be handed over to bpf_perf_event_output().

Also, please note that I am not duplicating any kernel functionality when it
comes to buffer handling, and in fact, I found it very easy to be able to
tap into the perf event ring-buffer implementation and add a feature that I
need for DTrace.  That was a very pleasant experience for sure!

Kris

  reply index

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-20 23:47 Kris Van Hees
2019-05-21 17:56 ` Alexei Starovoitov
2019-05-21 18:41   ` Kris Van Hees [this message]
2019-05-21 20:55     ` Alexei Starovoitov
2019-05-21 21:36       ` Steven Rostedt
2019-05-21 21:43         ` Alexei Starovoitov
2019-05-21 21:48           ` Steven Rostedt
2019-05-22  5:23             ` Kris Van Hees
2019-05-22 20:53               ` Alexei Starovoitov
2019-05-23  5:46                 ` Kris Van Hees
2019-05-23 21:13                   ` Alexei Starovoitov
2019-05-23 23:02                     ` Steven Rostedt
2019-05-24  0:31                       ` Alexei Starovoitov
2019-05-24  1:57                         ` Steven Rostedt
2019-05-24  2:08                           ` Alexei Starovoitov
2019-05-24  2:40                             ` Steven Rostedt
2019-05-24  5:26                             ` Kris Van Hees
2019-05-24  5:10                       ` Kris Van Hees
2019-05-24  4:05                     ` Kris Van Hees
2019-05-24 13:28                       ` Steven Rostedt
2019-05-21 21:36       ` Kris Van Hees
2019-05-21 23:26         ` Alexei Starovoitov
2019-05-22  4:12           ` Kris Van Hees
2019-05-22 20:16             ` Alexei Starovoitov
2019-05-23  5:16               ` Kris Van Hees
2019-05-23 20:28                 ` Alexei Starovoitov
2019-05-30 16:15                   ` Kris Van Hees
2019-05-31 15:25                     ` Chris Mason
2019-06-06 20:58                       ` Kris Van Hees
2019-06-18  1:25                   ` Kris Van Hees
2019-06-18  1:32                     ` Alexei Starovoitov
2019-06-18  1:54                       ` Kris Van Hees
2019-06-18  3:01                         ` Alexei Starovoitov
2019-06-18  3:19                           ` Kris Van Hees
2019-05-22 14:25   ` Peter Zijlstra
2019-05-22 18:22     ` Kris Van Hees
2019-05-22 19:55       ` Alexei Starovoitov
2019-05-22 20:20         ` David Miller
2019-05-23  5:19         ` Kris Van Hees
2019-05-24  7:27       ` Peter Zijlstra
2019-05-21 20:39 ` [RFC PATCH 01/11] bpf: context casting for tail call Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 02/11] bpf: add BPF_PROG_TYPE_DTRACE Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 03/11] bpf: export proto for bpf_perf_event_output helper Kris Van Hees
     [not found] ` <facilities>
2019-05-21 20:39   ` [RFC PATCH 04/11] trace: initial implementation of DTrace based on kernel Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 05/11] trace: update Kconfig and Makefile to include DTrace Kris Van Hees
     [not found] ` <features>
2019-05-21 20:39   ` [RFC PATCH 06/11] dtrace: tiny userspace tool to exercise DTrace support Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 07/11] bpf: implement writable buffers in contexts Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 08/11] perf: add perf_output_begin_forward_in_page Kris Van Hees
     [not found] ` <the>
     [not found]   ` <context>
2019-05-21 20:39     ` [RFC PATCH 09/11] bpf: mark helpers explicitly whether they may change Kris Van Hees
     [not found] ` <helpers>
2019-05-21 20:39   ` [RFC PATCH 10/11] bpf: add bpf_buffer_reserve and bpf_buffer_commit Kris Van Hees
2019-05-21 20:40 ` [RFC PATCH 11/11] dtrace: make use of writable buffers in BPF Kris Van Hees
2019-05-21 20:48 ` [RFC PATCH 00/11] bpf, trace, dtrace: DTrace BPF program type implementation and sample use Kris Van Hees
2019-05-21 20:54   ` Steven Rostedt
2019-05-21 20:56   ` Alexei Starovoitov

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190521184137.GH2422@oracle.com \
    --to=kris.van.hees@oracle.com \
    --cc=acme@kernel.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dtrace-devel@oss.oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org bpf@archiver.kernel.org
	public-inbox-index bpf


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/ public-inbox