BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Kris Van Hees <kris.van.hees@oracle.com>
Cc: Network Development <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>,
	dtrace-devel@oss.oracle.com, LKML <linux-kernel@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [RFC PATCH 00/11] bpf, trace, dtrace: DTrace BPF program type implementation and sample use
Date: Mon, 17 Jun 2019 18:32:22 -0700
Message-ID: <CAADnVQJoH4WOQ0t7ZhLgh4kh2obxkFs0UGDRas0y4QSqh1EMsg@mail.gmail.com> (raw)
In-Reply-To: <20190618012509.GF8794@oracle.com>

On Mon, Jun 17, 2019 at 6:25 PM Kris Van Hees <kris.van.hees@oracle.com> wrote:
>
> On Thu, May 23, 2019 at 01:28:44PM -0700, Alexei Starovoitov wrote:
>
> << stuff skipped because it is not relevant to the technical discussion... >>
>
> > > > In particular you brought up a good point that there is a use case
> > > > for sharing a piece of bpf program between kprobe and tracepoint events.
> > > > The better way to do that is via bpf2bpf call.
> > > > Example:
> > > > void bpf_subprog(arbitrary args)
> > > > {
> > > > }
> > > >
> > > > SEC("kprobe/__set_task_comm")
> > > > int bpf_prog_kprobe(struct pt_regs *ctx)
> > > > {
> > > >   bpf_subprog(...);
> > > > }
> > > >
> > > > SEC("tracepoint/sched/sched_switch")
> > > > int bpf_prog_tracepoint(struct sched_switch_args *ctx)
> > > > {
> > > >   bpf_subprog(...);
> > > > }
> > > >
> > > > Such configuration is not supported by the verifier yet.
> > > > We've been discussing it for some time, but no work has started,
> > > > since there was no concrete use case.
> > > > If you can work on adding support for it everyone will benefit.
> > > >
> > > > Could you please consider doing that as a step forward?
> > >
> > > This definitely looks to be an interesting addition and I am happy to look into
> > > that further.  I have a few questions that I hope you can shed light on...
> > >
> > > 1. What context would bpf_subprog execute with?  If it can be called from
> > >    multiple different prog types, would it see whichever context the caller
> > >    is executing with?  Or would you envision bpf_subprog to not be allowed to
> > >    access the execution context because it cannot know which one is in use?
> >
> > bpf_subprog() won't be able to access 'ctx' pointer _if_ it's ambiguous.
> > The verifier already smart enough to track all the data flow, so it's fine to
> > pass 'struct pt_regs *ctx' as long as it's accessed safely.
> > For example:
> > void bpf_subprog(int kind, struct pt_regs *ctx1, struct sched_switch_args *ctx2)
> > {
> >   if (kind == 1)
> >      bpf_printk("%d", ctx1->pc);
> >   if (kind == 2)
> >      bpf_printk("%d", ctx2->next_pid);
> > }
> >
> > SEC("kprobe/__set_task_comm")
> > int bpf_prog_kprobe(struct pt_regs *ctx)
> > {
> >   bpf_subprog(1, ctx, NULL);
> > }
> >
> > SEC("tracepoint/sched/sched_switch")
> > int bpf_prog_tracepoint(struct sched_switch_args *ctx)
> > {
> >   bpf_subprog(2, NULL, ctx);
> > }
> >
> > The verifier should be able to prove that the above is correct.
> > It can do so already if s/ctx1/map_value1/, s/ctx2/map_value2/
> > What's missing is an ability to have more than one 'starting' or 'root caller'
> > program.
> >
> > Now replace SEC("tracepoint/sched/sched_switch") with SEC("cgroup/ingress")
> > and it's becoming clear that BPF_PROG_TYPE_PROBE approach is not good enough, right?
> > Folks are already sharing the bpf progs between kprobe and networking.
> > Currently it's done via code duplication and actual sharing happens via maps.
> > That's not ideal, hence we've been discussing 'shared library' approach for
> > quite some time. We need a way to support common bpf functions that can be called
> > from networking and from tracing programs.
> >
> > > 2. Given that BPF programs are loaded with a specification of the prog type,
> > >    how would one load a code construct as the one you outline above?  How can
> > >    you load a BPF function and have it be used as subprog from programs that
> > >    are loaded separately?  I.e. in the sample above, if bpf_subprog is loaded
> > >    as part of loading bpf_prog_kprobe (prog type KPROBE), how can it be
> > >    referenced from bpf_prog_tracepoint (prog type TRACEPOINT) which would be
> > >    loaded separately?
> >
> > The api to support shared libraries was discussed, but not yet implemented.
> > We've discussed 'FD + name' approach.
> > FD identifies a loaded program (which is root program + a set of subprogs)
> > and other programs can be loaded at any time later. The BPF_CALL instructions
> > in such later program would refer to older subprogs via FD + name.
> > Note that both tracing and networking progs can be part of single elf file.
> > libbpf has to be smart to load progs into kernel step by step
> > and reusing subprogs that are already loaded.
> >
> > Note that libbpf work for such feature can begin _without_ kernel changes.
> > libbpf can pass bpf_prog_kprobe+bpf_subprog as a single program first,
> > then pass bpf_prog_tracepoint+bpf_subprog second (as a separate program).
> > The bpf_subprog will be duplicated and JITed twice, but sharing will happen
> > because data structures (maps, global and static data) will be shared.
> > This way the support for 'pseudo shared libraries' can begin.
> > (later accompanied by FD+name kernel support)
>
> As far as I can determine, the current libbpd implementation is already able
> to do the duplication of the called function, even when the ELF object contains
> programs of differemt program types.  I.e. the example you give at the top
> of the email actually seems to work already.  Right?

Have you tried it?

> In that case, I am a bit unsure what more can be done on the side of libbpf
> without needing kernel changes?

it's a bit weird to discuss hypothetical kernel changes when the first step
of changing libbpf wasn't even attempted.

  reply index

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-20 23:47 Kris Van Hees
2019-05-21 17:56 ` Alexei Starovoitov
2019-05-21 18:41   ` Kris Van Hees
2019-05-21 20:55     ` Alexei Starovoitov
2019-05-21 21:36       ` Steven Rostedt
2019-05-21 21:43         ` Alexei Starovoitov
2019-05-21 21:48           ` Steven Rostedt
2019-05-22  5:23             ` Kris Van Hees
2019-05-22 20:53               ` Alexei Starovoitov
2019-05-23  5:46                 ` Kris Van Hees
2019-05-23 21:13                   ` Alexei Starovoitov
2019-05-23 23:02                     ` Steven Rostedt
2019-05-24  0:31                       ` Alexei Starovoitov
2019-05-24  1:57                         ` Steven Rostedt
2019-05-24  2:08                           ` Alexei Starovoitov
2019-05-24  2:40                             ` Steven Rostedt
2019-05-24  5:26                             ` Kris Van Hees
2019-05-24  5:10                       ` Kris Van Hees
2019-05-24  4:05                     ` Kris Van Hees
2019-05-24 13:28                       ` Steven Rostedt
2019-05-21 21:36       ` Kris Van Hees
2019-05-21 23:26         ` Alexei Starovoitov
2019-05-22  4:12           ` Kris Van Hees
2019-05-22 20:16             ` Alexei Starovoitov
2019-05-23  5:16               ` Kris Van Hees
2019-05-23 20:28                 ` Alexei Starovoitov
2019-05-30 16:15                   ` Kris Van Hees
2019-05-31 15:25                     ` Chris Mason
2019-06-06 20:58                       ` Kris Van Hees
2019-06-18  1:25                   ` Kris Van Hees
2019-06-18  1:32                     ` Alexei Starovoitov [this message]
2019-06-18  1:54                       ` Kris Van Hees
2019-06-18  3:01                         ` Alexei Starovoitov
2019-06-18  3:19                           ` Kris Van Hees
2019-05-22 14:25   ` Peter Zijlstra
2019-05-22 18:22     ` Kris Van Hees
2019-05-22 19:55       ` Alexei Starovoitov
2019-05-22 20:20         ` David Miller
2019-05-23  5:19         ` Kris Van Hees
2019-05-24  7:27       ` Peter Zijlstra
2019-05-21 20:39 ` [RFC PATCH 01/11] bpf: context casting for tail call Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 02/11] bpf: add BPF_PROG_TYPE_DTRACE Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 03/11] bpf: export proto for bpf_perf_event_output helper Kris Van Hees
     [not found] ` <facilities>
2019-05-21 20:39   ` [RFC PATCH 04/11] trace: initial implementation of DTrace based on kernel Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 05/11] trace: update Kconfig and Makefile to include DTrace Kris Van Hees
     [not found] ` <features>
2019-05-21 20:39   ` [RFC PATCH 06/11] dtrace: tiny userspace tool to exercise DTrace support Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 07/11] bpf: implement writable buffers in contexts Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 08/11] perf: add perf_output_begin_forward_in_page Kris Van Hees
     [not found] ` <the>
     [not found]   ` <context>
2019-05-21 20:39     ` [RFC PATCH 09/11] bpf: mark helpers explicitly whether they may change Kris Van Hees
     [not found] ` <helpers>
2019-05-21 20:39   ` [RFC PATCH 10/11] bpf: add bpf_buffer_reserve and bpf_buffer_commit Kris Van Hees
2019-05-21 20:40 ` [RFC PATCH 11/11] dtrace: make use of writable buffers in BPF Kris Van Hees
2019-05-21 20:48 ` [RFC PATCH 00/11] bpf, trace, dtrace: DTrace BPF program type implementation and sample use Kris Van Hees
2019-05-21 20:54   ` Steven Rostedt
2019-05-21 20:56   ` Alexei Starovoitov

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAADnVQJoH4WOQ0t7ZhLgh4kh2obxkFs0UGDRas0y4QSqh1EMsg@mail.gmail.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=acme@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dtrace-devel@oss.oracle.com \
    --cc=kris.van.hees@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org bpf@archiver.kernel.org
	public-inbox-index bpf


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/ public-inbox