BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Kris Van Hees <kris.van.hees@oracle.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Kris Van Hees <kris.van.hees@oracle.com>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	dtrace-devel@oss.oracle.com, linux-kernel@vger.kernel.org,
	rostedt@goodmis.org, mhiramat@kernel.org, acme@kernel.org,
	ast@kernel.org, daniel@iogearbox.net, peterz@infradead.org
Subject: Re: [RFC PATCH 00/11] bpf, trace, dtrace: DTrace BPF program type implementation and sample use
Date: Mon, 17 Jun 2019 21:25:09 -0400
Message-ID: <20190618012509.GF8794@oracle.com> (raw)
In-Reply-To: <20190523202842.ij2quhpmem3nabii@ast-mbp.dhcp.thefacebook.com>

On Thu, May 23, 2019 at 01:28:44PM -0700, Alexei Starovoitov wrote:

<< stuff skipped because it is not relevant to the technical discussion... >>

> > > In particular you brought up a good point that there is a use case
> > > for sharing a piece of bpf program between kprobe and tracepoint events.
> > > The better way to do that is via bpf2bpf call.
> > > Example:
> > > void bpf_subprog(arbitrary args)
> > > {
> > > }
> > > 
> > > SEC("kprobe/__set_task_comm")
> > > int bpf_prog_kprobe(struct pt_regs *ctx)
> > > {
> > >   bpf_subprog(...);
> > > }
> > > 
> > > SEC("tracepoint/sched/sched_switch")
> > > int bpf_prog_tracepoint(struct sched_switch_args *ctx)
> > > {
> > >   bpf_subprog(...);
> > > }
> > > 
> > > Such configuration is not supported by the verifier yet.
> > > We've been discussing it for some time, but no work has started,
> > > since there was no concrete use case.
> > > If you can work on adding support for it everyone will benefit.
> > > 
> > > Could you please consider doing that as a step forward?
> > 
> > This definitely looks to be an interesting addition and I am happy to look into
> > that further.  I have a few questions that I hope you can shed light on...
> > 
> > 1. What context would bpf_subprog execute with?  If it can be called from
> >    multiple different prog types, would it see whichever context the caller
> >    is executing with?  Or would you envision bpf_subprog to not be allowed to
> >    access the execution context because it cannot know which one is in use?
> 
> bpf_subprog() won't be able to access 'ctx' pointer _if_ it's ambiguous.
> The verifier already smart enough to track all the data flow, so it's fine to
> pass 'struct pt_regs *ctx' as long as it's accessed safely.
> For example:
> void bpf_subprog(int kind, struct pt_regs *ctx1, struct sched_switch_args *ctx2)
> {
>   if (kind == 1)
>      bpf_printk("%d", ctx1->pc);
>   if (kind == 2)
>      bpf_printk("%d", ctx2->next_pid);
> }
> 
> SEC("kprobe/__set_task_comm")
> int bpf_prog_kprobe(struct pt_regs *ctx)
> {
>   bpf_subprog(1, ctx, NULL);
> }
> 
> SEC("tracepoint/sched/sched_switch")
> int bpf_prog_tracepoint(struct sched_switch_args *ctx)
> {
>   bpf_subprog(2, NULL, ctx);
> }
> 
> The verifier should be able to prove that the above is correct.
> It can do so already if s/ctx1/map_value1/, s/ctx2/map_value2/
> What's missing is an ability to have more than one 'starting' or 'root caller'
> program.
> 
> Now replace SEC("tracepoint/sched/sched_switch") with SEC("cgroup/ingress")
> and it's becoming clear that BPF_PROG_TYPE_PROBE approach is not good enough, right?
> Folks are already sharing the bpf progs between kprobe and networking.
> Currently it's done via code duplication and actual sharing happens via maps.
> That's not ideal, hence we've been discussing 'shared library' approach for
> quite some time. We need a way to support common bpf functions that can be called
> from networking and from tracing programs.
> 
> > 2. Given that BPF programs are loaded with a specification of the prog type, 
> >    how would one load a code construct as the one you outline above?  How can
> >    you load a BPF function and have it be used as subprog from programs that
> >    are loaded separately?  I.e. in the sample above, if bpf_subprog is loaded
> >    as part of loading bpf_prog_kprobe (prog type KPROBE), how can it be
> >    referenced from bpf_prog_tracepoint (prog type TRACEPOINT) which would be
> >    loaded separately?
> 
> The api to support shared libraries was discussed, but not yet implemented.
> We've discussed 'FD + name' approach.
> FD identifies a loaded program (which is root program + a set of subprogs)
> and other programs can be loaded at any time later. The BPF_CALL instructions
> in such later program would refer to older subprogs via FD + name.
> Note that both tracing and networking progs can be part of single elf file.
> libbpf has to be smart to load progs into kernel step by step
> and reusing subprogs that are already loaded.
> 
> Note that libbpf work for such feature can begin _without_ kernel changes.
> libbpf can pass bpf_prog_kprobe+bpf_subprog as a single program first,
> then pass bpf_prog_tracepoint+bpf_subprog second (as a separate program).
> The bpf_subprog will be duplicated and JITed twice, but sharing will happen
> because data structures (maps, global and static data) will be shared.
> This way the support for 'pseudo shared libraries' can begin.
> (later accompanied by FD+name kernel support)

As far as I can determine, the current libbpd implementation is already able
to do the duplication of the called function, even when the ELF object contains
programs of differemt program types.  I.e. the example you give at the top
of the email actually seems to work already.  Right?

In that case, I am a bit unsure what more can be done on the side of libbpf
without needing kernel changes?

> There are other things we discsused. Ideally the body of bpf_subprog()
> wouldn't need to be kept around for future verification when this bpf
> function is called by a different program. The idea was to
> use BTF and similar mechanism to ongoing 'bounded loop' work.
> So the verifier can analyze bpf_subprog() once and reuse that knowledge
> for dynamic linking with progs that will be loaded later.
> This is more long term work.
> A simple short term would be to verify the full call chain every time
> the subprog (bpf function) is reused.

  parent reply index

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-20 23:47 Kris Van Hees
2019-05-21 17:56 ` Alexei Starovoitov
2019-05-21 18:41   ` Kris Van Hees
2019-05-21 20:55     ` Alexei Starovoitov
2019-05-21 21:36       ` Steven Rostedt
2019-05-21 21:43         ` Alexei Starovoitov
2019-05-21 21:48           ` Steven Rostedt
2019-05-22  5:23             ` Kris Van Hees
2019-05-22 20:53               ` Alexei Starovoitov
2019-05-23  5:46                 ` Kris Van Hees
2019-05-23 21:13                   ` Alexei Starovoitov
2019-05-23 23:02                     ` Steven Rostedt
2019-05-24  0:31                       ` Alexei Starovoitov
2019-05-24  1:57                         ` Steven Rostedt
2019-05-24  2:08                           ` Alexei Starovoitov
2019-05-24  2:40                             ` Steven Rostedt
2019-05-24  5:26                             ` Kris Van Hees
2019-05-24  5:10                       ` Kris Van Hees
2019-05-24  4:05                     ` Kris Van Hees
2019-05-24 13:28                       ` Steven Rostedt
2019-05-21 21:36       ` Kris Van Hees
2019-05-21 23:26         ` Alexei Starovoitov
2019-05-22  4:12           ` Kris Van Hees
2019-05-22 20:16             ` Alexei Starovoitov
2019-05-23  5:16               ` Kris Van Hees
2019-05-23 20:28                 ` Alexei Starovoitov
2019-05-30 16:15                   ` Kris Van Hees
2019-05-31 15:25                     ` Chris Mason
2019-06-06 20:58                       ` Kris Van Hees
2019-06-18  1:25                   ` Kris Van Hees [this message]
2019-06-18  1:32                     ` Alexei Starovoitov
2019-06-18  1:54                       ` Kris Van Hees
2019-06-18  3:01                         ` Alexei Starovoitov
2019-06-18  3:19                           ` Kris Van Hees
2019-05-22 14:25   ` Peter Zijlstra
2019-05-22 18:22     ` Kris Van Hees
2019-05-22 19:55       ` Alexei Starovoitov
2019-05-22 20:20         ` David Miller
2019-05-23  5:19         ` Kris Van Hees
2019-05-24  7:27       ` Peter Zijlstra
2019-05-21 20:39 ` [RFC PATCH 01/11] bpf: context casting for tail call Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 02/11] bpf: add BPF_PROG_TYPE_DTRACE Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 03/11] bpf: export proto for bpf_perf_event_output helper Kris Van Hees
     [not found] ` <facilities>
2019-05-21 20:39   ` [RFC PATCH 04/11] trace: initial implementation of DTrace based on kernel Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 05/11] trace: update Kconfig and Makefile to include DTrace Kris Van Hees
     [not found] ` <features>
2019-05-21 20:39   ` [RFC PATCH 06/11] dtrace: tiny userspace tool to exercise DTrace support Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 07/11] bpf: implement writable buffers in contexts Kris Van Hees
2019-05-21 20:39 ` [RFC PATCH 08/11] perf: add perf_output_begin_forward_in_page Kris Van Hees
     [not found] ` <the>
     [not found]   ` <context>
2019-05-21 20:39     ` [RFC PATCH 09/11] bpf: mark helpers explicitly whether they may change Kris Van Hees
     [not found] ` <helpers>
2019-05-21 20:39   ` [RFC PATCH 10/11] bpf: add bpf_buffer_reserve and bpf_buffer_commit Kris Van Hees
2019-05-21 20:40 ` [RFC PATCH 11/11] dtrace: make use of writable buffers in BPF Kris Van Hees
2019-05-21 20:48 ` [RFC PATCH 00/11] bpf, trace, dtrace: DTrace BPF program type implementation and sample use Kris Van Hees
2019-05-21 20:54   ` Steven Rostedt
2019-05-21 20:56   ` Alexei Starovoitov

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190618012509.GF8794@oracle.com \
    --to=kris.van.hees@oracle.com \
    --cc=acme@kernel.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dtrace-devel@oss.oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org bpf@archiver.kernel.org
	public-inbox-index bpf


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/ public-inbox