BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Kris Van Hees <kris.van.hees@oracle.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Kris Van Hees <kris.van.hees@oracle.com>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	dtrace-devel@oss.oracle.com, linux-kernel@vger.kernel.org,
	mhiramat@kernel.org, acme@kernel.org, ast@kernel.org,
	daniel@iogearbox.net, peterz@infradead.org
Subject: Re: [RFC PATCH 00/11] bpf, trace, dtrace: DTrace BPF program type implementation and sample use
Date: Wed, 22 May 2019 01:23:27 -0400
Message-ID: <20190522052327.GN2422@oracle.com> (raw)
In-Reply-To: <20190521174757.74ec8937@gandalf.local.home>

On Tue, May 21, 2019 at 05:48:11PM -0400, Steven Rostedt wrote:
> On Tue, 21 May 2019 14:43:26 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > Steve,
> > sounds like you've missed all prior threads.
> 
> I probably have missed them ;-)
> 
> > The feedback was given to Kris it was very clear:
> > implement dtrace the same way as bpftrace is working with bpf.
> > No changes are necessary to dtrace scripts
> > and no kernel changes are necessary.
> 
> Kris, I haven't been keeping up on all the discussions. But what
> exactly is the issue where Dtrace can't be done the same way as the
> bpftrace is done?

There are several issues (and I keep finding new ones as I move forward) but
the biggest one is that I am not trying to re-design and re-implement) DTrace
from the ground up.  We have an existing userspace component that is getting
modified to work with a new kernel implementation (based on BPF and various
other kernel features that are thankfully available these days).  But we need
to ensure that the userspace component continues to function exactly as one
would expect.  There should be no need to modify DTrace scripts.  Perhaps
bpftrace could be taught to parse DTrace scripts (i.e. implement the D script
language with all its bells and whistles) but it currently cannot and DTrace
obviously can.  It seems to be a better use of resources to focus on the
kernel component, where we can really provide a much cleaner implementation
for DTrace probe execution because BPF is available and very powerful.

Userspace aside, there are various features that are not currently available
such as retrieving the ppid of the current task, and various other data items
that relate to the current task that triggered a probe.  There are ways to
work around it (using the bpf_probe_read() helper, which actually performs a
probe_kernel_read()) but that is rather clunky and definitely shouldn't be
something that can be done from a BPF program if we're doing unprivileged
tracing (which is a goal that is important for us).  New helpers can be added
for things like this, but the list grows large very quickly once you look at
what information DTrace scripts tend to use.

One of the benefits of DTrace is that probes are largely abstracted entities
when you get to the script level.  While different probes provide different
data, they are all represented as probe arguments and they are accessed in a
very consistent manner that is independent from the actual kind of probe that
triggered the execution.  Often, a single DTrace clause is associated with
multiple probes, of different types.  Probes in the kernel (kprobe, perf event,
tracepoint, ...) are associated with their own BPF program type, so it is not
possible to load the DTrace clause (translated into BPF code) once and
associate it with probes of different types.  Instead, I'd have to load it
as a BPF_PROG_TYPE_KPROBE program to associate it with a kprobe, and I'd have
to load it as a BPF_PROG_TYPE_TRACEPOINT program to associate it with a
tracepoint, and so on.  This also means that I suddenly have to add code to
the userspace component to know about the different program types with more
detail, like what helpers are available to specific program types.

Another advantage of being able to operate on a more abstract probe concept
that is not tied to a specific probe type is that the userspace component does
not need to know about the implementation details of the specific probes.
This avoids a tight coupling between the userspace component and the kernel
implementation.

Another feature that is currently not supported is speculative tracing.  This
is a feature that is not as commonly used (although I personally have found it
to be very useful in the past couple of years) but it quite powerful because
it allows for probe data to be recorded, and have the decision on whether it
is to be made available to userspace postponed to a later event.  At that time,
the data can be discarded or committed.

These are just some examples of issues I have been working on.  I spent quite
a bit of time to look for ways to implement what we need for DTrace with a
minimal amount of patches to the kernel because there really isn't any point
in doing unnecessary work.  I do not doubt that there are possible clever
ways to somehow get around some of these issues with clever hacks and
workarounds, but I am not trying to hack something together that hopefully
will be close enough to the expected functionality.

DTrace has proven itself to be quite useful and dependable as a tracing
solution, and I am working on continuing to deliver on that while recognizing
the significant work that others have put into advancing the tracing
infrastructure in Linux in recent years.  So many people have contributed
excellent features - and I am making use of those features as much as I can.
But as is often the case, not everything that I need is currently implemented.
As I expressed during last year's Plumbers in Vancouver, I am putting a very
strong emphasis on ensuring that what I propose as contributions is not
limited to just DTrace.  My goal is to work in an open, collaborative manner,
providing features that anyone can use if they want to.

I wish that the assertion that "no changes are necessary to dtrace scripts and
no kernel changes are necessary" were true, but my own findings contradict
that.  To my knowledge no tool exists right now that can execute any and all
valid DTrace scripts without any changes to the scripts and without any changes
to the kernel.  The only tool I know that can execute DTrace scripts right now
does require rather extensive kernel changes, and the work I am doing right now
is aimed at doing much better than that.

	Kris

  reply index

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-20 23:47 Kris Van Hees
2019-05-21 17:56 ` Alexei Starovoitov
2019-05-21 18:41   ` Kris Van Hees
2019-05-21 20:55     ` Alexei Starovoitov
2019-05-21 21:36       ` Steven Rostedt
2019-05-21 21:43         ` Alexei Starovoitov
2019-05-21 21:48           ` Steven Rostedt
2019-05-22  5:23             ` Kris Van Hees [this message]
2019-05-22 20:53               ` Alexei Starovoitov
2019-05-23  5:46                 ` Kris Van Hees
2019-05-23 21:13                   ` Alexei Starovoitov
2019-05-23 23:02                     ` Steven Rostedt
2019-05-24  0:31                       ` Alexei Starovoitov
2019-05-24  1:57                         ` Steven Rostedt
2019-05-24  2:08                           ` Alexei Starovoitov
2019-05-24  2:40                             ` Steven Rostedt
2019-05-24  5:26                             ` Kris Van Hees
2019-05-24  5:10                       ` Kris Van Hees
2019-05-24  4:05                     ` Kris Van Hees
2019-05-24 13:28                       ` Steven Rostedt
2019-05-21 21:36       ` Kris Van Hees
2019-05-21 23:26         ` Alexei Starovoitov
2019-05-22  4:12           ` Kris Van Hees
2019-05-22 20:16             ` Alexei Starovoitov
2019-05-23  5:16               ` Kris Van Hees
2019-05-23 20:28                 ` Alexei Starovoitov
2019-05-30 16:15                   ` Kris Van Hees
2019-05-31 15:25                     ` Chris Mason
2019-06-06 20:58                       ` Kris Van Hees
2019-06-18  1:25                   ` Kris Van Hees
2019-06-18  1:32                     ` Alexei Starovoitov
2019-06-18  1:54                       ` Kris Van Hees
2019-06-18  3:01                         ` Alexei Starovoitov
2019-06-18  3:19                           ` Kris Van Hees
2019-05-22 14:25   ` Peter Zijlstra
2019-05-22 18:22     ` Kris Van Hees
2019-05-22 19:55       ` Alexei Starovoitov
2019-05-22 20:20         ` David Miller
2019-05-23  5:19         ` Kris Van Hees
2019-05-24  7:27       ` Peter Zijlstra
2019-05-21 20:39 ` [RFC PATCH 01/11] bpf: context casting for tail call Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 02/11] bpf: add BPF_PROG_TYPE_DTRACE Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 03/11] bpf: export proto for bpf_perf_event_output helper Kris Van Hees
     [not found] ` <facilities>
2019-05-21 20:39   ` [RFC PATCH 04/11] trace: initial implementation of DTrace based on kernel Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 05/11] trace: update Kconfig and Makefile to include DTrace Kris Van Hees
     [not found] ` <features>
2019-05-21 20:39   ` [RFC PATCH 06/11] dtrace: tiny userspace tool to exercise DTrace support Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 07/11] bpf: implement writable buffers in contexts Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 08/11] perf: add perf_output_begin_forward_in_page Kris Van Hees
     [not found] ` <the>
     [not found]   ` <context>
2019-05-21 20:39     ` [RFC PATCH 09/11] bpf: mark helpers explicitly whether they may change Kris Van Hees
     [not found] ` <helpers>
2019-05-21 20:39   ` [RFC PATCH 10/11] bpf: add bpf_buffer_reserve and bpf_buffer_commit Kris Van Hees
2019-05-21 20:40 ` [RFC PATCH 11/11] dtrace: make use of writable buffers in BPF Kris Van Hees
2019-05-21 20:48 ` [RFC PATCH 00/11] bpf, trace, dtrace: DTrace BPF program type implementation and sample use Kris Van Hees
2019-05-21 20:54   ` Steven Rostedt
2019-05-21 20:56   ` Alexei Starovoitov

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190522052327.GN2422@oracle.com \
    --to=kris.van.hees@oracle.com \
    --cc=acme@kernel.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dtrace-devel@oss.oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org bpf@archiver.kernel.org
	public-inbox-index bpf


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/ public-inbox