netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"Andrii Nakryiko" <andriin@fb.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@chromium.org>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next v2 3/6] bpf: support attaching freplace programs to multiple attach points
Date: Mon, 20 Jul 2020 20:48:04 -0700	[thread overview]
Message-ID: <CAEf4BzYtD9dGUy3hZRRAA56CaVvW7xTR9tp0dXKyVQXym046eQ@mail.gmail.com> (raw)
In-Reply-To: <20200720233455.6ito7n2eqojlfnvk@ast-mbp.dhcp.thefacebook.com>

On Mon, Jul 20, 2020 at 4:35 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sun, Jul 19, 2020 at 10:02:48PM -0700, Andrii Nakryiko wrote:
> > On Thu, Jul 16, 2020 at 7:06 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Jul 16, 2020 at 12:50:05PM +0200, Toke Høiland-Jørgensen wrote:
> > > > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> > > >
> > > > > On Wed, Jul 15, 2020 at 03:09:02PM +0200, Toke Høiland-Jørgensen wrote:
> > > > >>
> > > > >> +  if (tgt_prog_fd) {
> > > > >> +          /* For now we only allow new targets for BPF_PROG_TYPE_EXT */
> > > > >> +          if (prog->type != BPF_PROG_TYPE_EXT ||
> > > > >> +              !btf_id) {
> > > > >> +                  err = -EINVAL;
> > > > >> +                  goto out_put_prog;
> > > > >> +          }
> > > > >> +          tgt_prog = bpf_prog_get(tgt_prog_fd);
> > > > >> +          if (IS_ERR(tgt_prog)) {
> > > > >> +                  err = PTR_ERR(tgt_prog);
> > > > >> +                  tgt_prog = NULL;
> > > > >> +                  goto out_put_prog;
> > > > >> +          }
> > > > >> +
> > > > >> +  } else if (btf_id) {
> > > > >> +          err = -EINVAL;
> > > > >> +          goto out_put_prog;
> > > > >> +  } else {
> > > > >> +          btf_id = prog->aux->attach_btf_id;
> > > > >> +          tgt_prog = prog->aux->linked_prog;
> > > > >> +          if (tgt_prog)
> > > > >> +                  bpf_prog_inc(tgt_prog); /* we call bpf_prog_put() on link release */
> > > > >
> > > > > so the first prog_load cmd will beholding the first target prog?
> > > > > This is complete non starter.
> > > > > You didn't mention such decision anywhere.
> > > > > The first ext prog will attach to the first dispatcher xdp prog,
> > > > > then that ext prog will multi attach to second dispatcher xdp prog and
> > > > > the first dispatcher prog will live in the kernel forever.
> > > >
> > > > Huh, yeah, you're right that's no good. Missing that was a think-o on my
> > > > part, sorry about that :/
> > > >
> > > > > That's not what we discussed back in April.
> > > >
> > > > No, you mentioned turning aux->linked_prog into a list. However once I
> > > > started looking at it I figured it was better to actually have all this
> > > > (the trampoline and ref) as part of the bpf_link structure, since
> > > > logically they're related.
> > > >
> > > > But as you pointed out, the original reference sticks. So either that
> > > > needs to be removed, or I need to go back to the 'aux->linked_progs as a
> > > > list' idea. Any preference?
> > >
> > > Good question. Back then I was thinking about converting linked_prog into link
> > > list, since standalone single linked_prog is quite odd, because attaching ext
> > > prog to multiple tgt progs should have equivalent properties across all
> > > attachments.
> > > Back then bpf_link wasn't quite developed.
> > > Now I feel moving into bpf_tracing_link is better.
> > > I guess a link list of bpf_tracing_link-s from 'struct bpf_prog' might work.
> > > At prog load time we can do bpf_link_init() only (without doing bpf_link_prime)
> > > and keep this pre-populated bpf_link with target bpf prog and trampoline
> > > in a link list accessed from 'struct bpf_prog'.
> > > Then bpf_tracing_prog_attach() without extra tgt_prog_fd/btf_id would complete
> > > that bpf_tracing_link by calling bpf_link_prime() and bpf_link_settle()
> > > without allocating new one.
> > > Something like:
> > > struct bpf_tracing_link {
> > >         struct bpf_link link;  /* ext prog pointer is hidding in there */
> > >         enum bpf_attach_type attach_type;
> > >         struct bpf_trampoline *tr;
> > >         struct bpf_prog *tgt_prog; /* old aux->linked_prog */
> > > };
> > >
> > > ext prog -> aux -> link list of above bpf_tracing_link-s
> > >
> > > It's a circular reference, obviously.
> > > Need to think through the complications and locking.
> > >
> > > bpf_tracing_prog_attach() with tgt_prog_fd/btf_id will alloc new bpf_tracing_link
> > > and will add it to a link list.
> > >
> > > Just a rough idea. I wonder what Andrii thinks.
> > >
> >
> > I need to spend more time reading existing and new code to see all the
> > details, but I'll throw a slightly different proposal and let you guys
> > shoot it down.
> >
> > So, what if instead of having linked_prog (as bpf_prog *, refcnt'ed),
> > at BPF_PROG_LOAD time we just record the target prog's ID. BPF
> > verifier, when doing its target prog checks would attempt to get
> > bpf_prog * reference; if by that time the target program is gone,
> > fail, of course. If not, everything proceeds as is, at the end of
> > verification target_prog is put until attach time.
> >
> > Then at attach time, we either go with pre-recorded (in
> > prog->aux->linked_prog_id) target prog's ID or we get a new one from
> > RAW_TP_OPEN tgt_prog_fd. Either way, we bump refcnt on that target
> > prog and keep it with bpf_tracing_link (so link on detach would put
> > target_prog, that way it doesn't go away while EXT prog is attached).
> > Then do all the compatibility checks, and if everything works out,
> > bpf_tracing_link gets created, we record trampoline there, etc, etc.
> > Basically, instead of having an EXT prog holding a reference to the
> > target prog, only attachment (bpf_link) does that, which conceptually
> > also seems to make more sense to me. For verification we store prog ID
> > and don't hold target prog at all.
> >
> >
> > Now, there will be a problem once you attach EXT prog to a new XDP
> > root program and release a link against the original XDP root program.
> > First, I hope I understand the desired sequence right, here's an
> > example:
> >
> > 1. load XDP root prog X
> > 2. load EXT prog with target prog X
> > 3. attach EXT prog to prog X
> > 4. load XDP root prog Y
> > 5. attach EXT prog to prog Y (Y and X should be "compatible")
> > 6. detach prog X (close bpf_link)
> >
> > Is that the right sequence?
> >
> > If yes, then the problem with storing ID of prog X in EXT
> > prog->aux->linked_prog_id is that you won't be able to re-attach to
> > new prog Z, because there won't be anything to check compatibility
> > against (prog X will be long time gone).
> >
> > So we can do two things here:
> >
> > 1. on attach, replace ext_prog->aux->linked_prog_id with the latest
> > attached prog (prog Y ID from above example)
> > 2. instead of recording target program FD/ID, capture BTF FD and/or
> > enough BTF information for checking compatibility.
> >
> > Approach 2) seems like conceptually the right thing to do (record type
> > info we care about, not an **instance** of BPF program, compatible
> > with that type info), but technically might be harder.
>
> I've read your proposal couple times and still don't get what you're
> trying to solve with either ID or BTF info recording.
> So that target prog doesn't get refcnt-ed? What's a problem with it?
> Currently it's being refcnt-d in aux->linked_prog.
> What I'm proposing about is to convert aux->linked_prog into a link list
> of bpf_tracing_links which will contain linked_prog inside.
> Conceptually that's what bpf_link is doing. It links two progs.
> EXT prog is recorded in 'struct bpf_link' and
> the target prog is recorded in 'struct bpf_tracing_link'.
> So from bpf_link perspective everything seems clean to me.
> The link list of bpf_tracing_link-s in EXT_prog->aux is only to preserve
> existing api of prog_load cmd.

Right, I wanted to avoid taking a refcnt on aux->linked_prog during
PROG_LOAD. The reason for that was (and still is) that I don't get who
and when has to bpf_prog_put() original aux->linked_prog to allow the
prog X to be freed. I.e., after you re-attach to prog Y, how prog X is
released (assuming no active bpf_link is keeping it from being freed)?
That's my biggest confusion right now.

I also didn't like the idea of half-creating bpf_tracing_link on
PROG_LOAD and then turning it into a real link with bpf_link_settle on
attach. That sounded like a hack to me.

But now I'm also confused why we need to turn aux->linked_prog into a
list. Seems like we need it only for old-style attach that doesn't
specify tgt_prog_fd, no? Only in that case we'll use aux->linked_prog.
Otherwise we know the target prog from tgt_prog_fd. So I'll be honest
that I don't get the whole idea of maintaining a list of
bpf_tracing_links. It seems like it should be possible to make
bpf_tracing_link decoupled from any prog's aux and have their own
independent lifetime.

>
> As far as step 5: attach EXT prog to prog Y (Y and X should be "compatible")
> The chance of failure there should be minimal. libxdp/libdispatcher will
> prepare rootlet XDP prog. It should really make sure that Y and X are compatible.
> This should be invisible to users.

Right, of course, but the kernel needs to validate that anyways, which
is why I pointed that out. Or are you saying we should just assume
that they are valid?

>
> In addition we still need bpf_link_update_hook() I was talking about in April.
> The full sequence is:
> first user process:
>  1. load XDP root prog X
>  1' root_link = attach X to eth0
>  2. load EXT prog with target prog X
>  3. app1_link_fd = attach EXT prog to prog X
> second user process:
>  4. load XDP root prog Y
>  4'. find EXT prog of the first user process
>  5. app2_link_fd = attach EXT prog to prog Y (Y and X should be "compatible")
>  6. bpf_link_update(root_link, X, Y); // now packet flows into Y and into EXT
>    // while EXT is attached in two places
>  7. app1_link_fd' = FD in second process that points to the same tracing link
>     as app1_link_fd in the first process.
>    bpf_link_update_hook(app1_link_fd', app2_link_fd)
> the last operation need to update bpf_tracing_link that is held by app1
> (which is the first user process) from the second user process. It needs to
> retarget (update_hook) inside bpf_tracing_link from X to Y.
> Since the processes are more or less not aware of each other.
> One firewall holds link_fd that connects EXT to X,
> but the second firewall (via libxdp) is updaing that tracing link
> to re-hook EXT into Y.

Yeah, should be doable given that bpf_trampoline is independently refcounted.

  reply	other threads:[~2020-07-21  3:48 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-15 13:08 [PATCH bpf-next v2 0/6] bpf: Support multi-attach for freplace programs Toke Høiland-Jørgensen
2020-07-15 13:09 ` [PATCH bpf-next v2 1/6] bpf: change logging calls from verbose() to bpf_log() and use log pointer Toke Høiland-Jørgensen
2020-07-16 10:10   ` Dan Carpenter
2020-07-15 13:09 ` [PATCH bpf-next v2 2/6] bpf: verifier: refactor check_attach_btf_id() Toke Høiland-Jørgensen
2020-07-15 13:09 ` [PATCH bpf-next v2 3/6] bpf: support attaching freplace programs to multiple attach points Toke Høiland-Jørgensen
2020-07-15 20:44   ` Alexei Starovoitov
2020-07-16 10:50     ` Toke Høiland-Jørgensen
2020-07-17  2:05       ` Alexei Starovoitov
2020-07-17 10:52         ` Toke Høiland-Jørgensen
2020-07-20 22:57           ` Alexei Starovoitov
2020-07-20  5:02         ` Andrii Nakryiko
2020-07-20 23:34           ` Alexei Starovoitov
2020-07-21  3:48             ` Andrii Nakryiko [this message]
2020-07-22  0:29               ` Alexei Starovoitov
2020-07-22  6:02                 ` Andrii Nakryiko
2020-07-23  0:32                   ` Alexei Starovoitov
2020-07-23  0:56                     ` Andrii Nakryiko
2020-07-16 10:14   ` Dan Carpenter
2020-07-15 13:09 ` [PATCH bpf-next v2 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h Toke Høiland-Jørgensen
2020-07-15 13:09 ` [PATCH bpf-next v2 5/6] libbpf: add support for supplying target to bpf_raw_tracepoint_open() Toke Høiland-Jørgensen
2020-07-15 13:09 ` [PATCH bpf-next v2 6/6] selftests: add test for multiple attachments of freplace program Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEf4BzYtD9dGUy3hZRRAA56CaVvW7xTR9tp0dXKyVQXym046eQ@mail.gmail.com \
    --to=andrii.nakryiko@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@chromium.org \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=toke@redhat.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).