bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Lorenz Bauer <lmb@cloudflare.com>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Edward Cree" <ecree@solarflare.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"Marek Majkowski" <marek@cloudflare.com>,
	"Alan Maguire" <alan.maguire@oracle.com>,
	"Jesper Dangaard Brouer" <brouer@redhat.com>,
	"David Miller" <davem@davemloft.net>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>
Subject: Re: static and dynamic linking. Was: [PATCH bpf-next v3 1/5] bpf: Support chain calling multiple BPF
Date: Fri, 15 Nov 2019 15:02:31 -0800	[thread overview]
Message-ID: <20191115230229.qd6plwnvrmcm4pfo@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <CACAyw98dcpu1b2nUf7ize2SJGJGd=mhqRK+PYQTx96gSBtbkNQ@mail.gmail.com>

On Fri, Nov 15, 2019 at 11:48:42AM +0000, Lorenz Bauer wrote:
> On Tue, 12 Nov 2019 at 02:51, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > This is static linking. The existing kernel infrastructure already supports
> > such model and I think it's enough for a lot of use cases. In particular fb's
> > firewall+katran XDP style will fit right in. But bpf_tail_calls are
> > incompatible with bpf2bpf calls that static linking will use and I think
> > cloudlfare folks expressed the interest to use them for some reason even within
> > single firewall ? so we need to improve the model a bit.
> We several components that we'd like to keep (logically) separate. At a high
> level, our rootlet would look like this:
>   sample_packets(ctx);
>   if (ddos_mitigate(ctx) != XDP_PASS) {
>      return XDP_DROP;
>   }
>   return l4lb(ctx);
> I think we could statically link ddos_mitigate() together from
> multiple separate .o.
> It depends on how complicated our rules become. Maybe we'd use dynamic linking,
> to reduce the overhead of re-verification.
> The rootlet would use dynamic linking, to be able to debug / inspect
> sampling, ddos
> mitigation and the l4lb separately. Combined with the ability to hook
> arbitrary BPF
> programs at entry / exit we could probably get rid of our tail_call
> use. I don't think
> you have to change the model for us to fit into it.
> > We can introduce dynamic linking. The second part of 'BPF trampoline' patches
> > allows tracing programs to attach to other BPF programs. The idea of dynamic
> > linking is to replace a program or subprogram instead of attaching to it.
> Reading the rest of the thread, I'm on board with type 2 of dynamic linking
> (load time linking?) However, type 1 (run time linking) I'm not so sure about.
> Specifically, the callee holding onto the caller instead of vice versa.
> Picking up your rootlet and fw1 example: fw1 holds the refcount on rootlet.
> This means user space needs to hold the refcount on fw1 to make sure
> the override is kept. This in turn means either: hold on to the file descriptor
> or pin the program into a bpffs. The former implies a persistent process,
> which doesn't work for tc. 

Pinning the program alone is not enough.
The folks have requested a feature to pin raw_tp_fd into bpffs.
I'm very much in favor of that. imo all other 'link fd' should be pinnable.
Then auto-detach won't happen on close() of such 'link fd'.

> The latter makes  lifetime management of fw1 hard:
> there is no way to have the kernel automatically deallocate it when it no longer
> needed, aka when the rootlet refcount reaches zero. 

hmm. Not sure I follow. See below.

> It also overloads close()
> to automatically detach the replaced / overridden BPF, which is contrary to how
> other BPF hook points work.

imo all bpf attach api-s that are not FD-based are fragile and error prone
operationally. We've seen people adding a ton of TC ingress progs because of
bugs. Then there were issues with roolet being removed due to bugs. The issues
with overriding wrong entries in prog_array. When multiple teams working on
common infra having globally visible and unprotected state is dangerous. imo
XDP and TC have to move to FD based api. When application holds the 'link fd'
that attached program will not be detached by anything else accidentally.
The operation 'attach rootlet XDP prog to netdev eth0' should return FD.
While that FD is held by application (or pinned in bpffs) nothing should be
able to override XDP prog on that eth0. We don't have such api yet, but I think
it's necessary. Same thing with replacing rootlet's placeholder subprogram with
fw1. When fw1's application links fw1 prog into rootlet nothing should be able
to break that attachment. But if fw1 application crashes that fw1 prog will be
auto-detached from rootlet. The admin can ssh into the box and kill fw1. The
packets will flow into rootlet and will flow into dummy placeholder. No
cleanups to worry about.

In case of fentry/fexit that 'link fd' protects the connection between
tracer prog and tracee prog. Whether tracer has a link to tracee or vice
versa doesn't matter. That's kernel internal implementation. The
uni-directional link avoids circular issues.

> I'd much prefer if the API didn't require attach_prog_fd and id at
> load time, and
> rather have an explicit replace_sub_prog(prog_fd, btf_id, sub_prog_fd).

The verifier has to see the target prog and its full BTF at load time. The
fentry prog needs target prog's BTF. XDP replacement prog needs target prog's
BTF too. So prog_fd+btf_id has to be passed at load time. I think
tgt_prog->refcnt++ should be done at load time too. The ugly alternative would
be to do tgt_prog->refcnt++ before verification. Then after verification
duplicate tgt_prog BTF, store it somewhere, and do tgr_prog->refcnt--. Then
later during attach/replace compare saved BTF with tgt_prog's BTF. That's imo a
ton of unncessary work for the kernel.

> > [...] This rootlet.o
> > can be automatically generated by libxdp.so. If in the future we figure out how
> > to do two load-balancers libxdp.so will be able to accommodate that new policy.
> > This firewall1.o can be developed and tested independently of other xdp
> > programs. The key gotcha here is that the verifier needs to allow more than 512
> > stack usage for the rootlet.o. I think that's acceptable.
> How would the verifier know which programs are allowed to have larger stacks?

Excellent question. I was thinking that every individually verified unit will
be under existing 512 limitation. The composition needs to check for cycles in
the attach graph and for maximum stack. The maximum stack will have some limit.
That's minor. More interesting problem to solve is how to check for cycles.

  reply	other threads:[~2019-11-15 23:02 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-07 17:20 [PATCH bpf-next v3 0/5] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
2019-10-07 17:20 ` [PATCH bpf-next v3 1/5] bpf: Support chain calling multiple BPF programs after each other Toke Høiland-Jørgensen
2019-10-07 20:42   ` Alexei Starovoitov
2019-10-08  8:07     ` Toke Høiland-Jørgensen
2019-10-09  1:51       ` Alexei Starovoitov
2019-10-09  8:03         ` Toke Høiland-Jørgensen
2019-10-10  4:41           ` Alexei Starovoitov
2019-10-14 12:35             ` Toke Høiland-Jørgensen
2019-10-14 17:08               ` John Fastabend
2019-10-14 18:48                 ` Toke Høiland-Jørgensen
2019-10-15 16:30                   ` Edward Cree
2019-10-15 16:42                     ` Toke Høiland-Jørgensen
2019-10-15 18:33                       ` Edward Cree
2019-10-17 12:11                         ` Toke Høiland-Jørgensen
2019-10-22 17:27                           ` Edward Cree
2019-10-22 18:07                             ` Toke Høiland-Jørgensen
2019-11-12  2:51                               ` static and dynamic linking. Was: [PATCH bpf-next v3 1/5] bpf: Support chain calling multiple BPF Alexei Starovoitov
2019-11-12 16:20                                 ` Toke Høiland-Jørgensen
2019-11-12 19:52                                   ` Alexei Starovoitov
2019-11-12 21:25                                     ` Edward Cree
2019-11-12 23:18                                       ` Alexei Starovoitov
2019-11-13 18:30                                         ` Edward Cree
2019-11-13 18:51                                           ` Andrii Nakryiko
2019-11-15  2:13                                           ` Alexei Starovoitov
2019-11-15 16:56                                             ` John Fastabend
2019-11-12 23:25                                     ` John Fastabend
2019-11-13  0:21                                       ` Alexei Starovoitov
2019-11-13  5:33                                         ` John Fastabend
2019-11-15  1:50                                           ` Alexei Starovoitov
2019-11-15 16:39                                             ` John Fastabend
2019-11-14 15:41                                     ` Toke Høiland-Jørgensen
2019-11-12 16:32                                 ` Edward Cree
2019-11-15 11:48                                 ` Lorenz Bauer
2019-11-15 23:02                                   ` Alexei Starovoitov [this message]
2019-11-18 13:29                                     ` Lorenz Bauer
2019-10-21 23:51                         ` [PATCH bpf-next v3 1/5] bpf: Support chain calling multiple BPF programs after each other Edward Cree
2019-10-16  2:28               ` Alexei Starovoitov
2019-10-16  8:27                 ` Jesper Dangaard Brouer
2019-10-16 10:35                   ` Daniel Borkmann
2019-10-16 11:16                     ` Toke Høiland-Jørgensen
2019-10-16 13:51                 ` Toke Høiland-Jørgensen
2019-10-19 20:09                   ` bpf indirect calls Alexei Starovoitov
2019-10-20 10:58                     ` Toke Høiland-Jørgensen
2019-10-25 16:30                       ` Alexei Starovoitov
2019-10-27 12:15                         ` Toke Høiland-Jørgensen
2023-09-27 13:27                     ` Matt Bobrowski
2023-09-29 21:06                       ` Alexei Starovoitov
2023-10-02 18:50                         ` Barret Rhoden
2023-10-06  9:36                         ` Matt Bobrowski
2023-10-06 18:49                           ` Alexei Starovoitov
2023-10-19 12:28                             ` Matt Bobrowski
2019-10-09 10:19         ` [PATCH bpf-next v3 1/5] bpf: Support chain calling multiple BPF programs after each other Jesper Dangaard Brouer
2019-10-09 17:57           ` Alexei Starovoitov
2019-10-07 17:20 ` [PATCH bpf-next v3 2/5] bpf: Add support for setting chain call sequence for programs Toke Høiland-Jørgensen
2019-10-07 20:38   ` Daniel Borkmann
2019-10-08  8:09     ` Toke Høiland-Jørgensen
2019-10-07 17:20 ` [PATCH bpf-next v3 3/5] tools: Update bpf.h header for program chain calls Toke Høiland-Jørgensen
2019-10-07 17:20 ` [PATCH bpf-next v3 4/5] libbpf: Add syscall wrappers for BPF_PROG_CHAIN_* commands Toke Høiland-Jørgensen
2019-10-07 17:20 ` [PATCH bpf-next v3 5/5] selftests: Add tests for XDP chain calls Toke Høiland-Jørgensen
2019-10-07 18:58 ` [PATCH bpf-next v3 0/5] xdp: Support multiple programs on a single interface through " John Fastabend
2019-10-08  8:42   ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191115230229.qd6plwnvrmcm4pfo@ast-mbp.dhcp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=alan.maguire@oracle.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=ecree@solarflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=lmb@cloudflare.com \
    --cc=marek@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=toke@redhat.com \
    --cc=yhs@fb.com \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).