All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"Andrii Nakryiko" <andriin@fb.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@chromium.org>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next v2 3/6] bpf: support attaching freplace programs to multiple attach points
Date: Sun, 19 Jul 2020 22:02:48 -0700	[thread overview]
Message-ID: <CAEf4BzYd4Xrn4EqzqHCTuJ8TnZiTC1vWWvd=9Np+LNrgbtxOcQ@mail.gmail.com> (raw)
In-Reply-To: <20200717020507.jpxxe4dbc2watsfh@ast-mbp.dhcp.thefacebook.com>

On Thu, Jul 16, 2020 at 7:06 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jul 16, 2020 at 12:50:05PM +0200, Toke Høiland-Jørgensen wrote:
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >
> > > On Wed, Jul 15, 2020 at 03:09:02PM +0200, Toke Høiland-Jørgensen wrote:
> > >>
> > >> +  if (tgt_prog_fd) {
> > >> +          /* For now we only allow new targets for BPF_PROG_TYPE_EXT */
> > >> +          if (prog->type != BPF_PROG_TYPE_EXT ||
> > >> +              !btf_id) {
> > >> +                  err = -EINVAL;
> > >> +                  goto out_put_prog;
> > >> +          }
> > >> +          tgt_prog = bpf_prog_get(tgt_prog_fd);
> > >> +          if (IS_ERR(tgt_prog)) {
> > >> +                  err = PTR_ERR(tgt_prog);
> > >> +                  tgt_prog = NULL;
> > >> +                  goto out_put_prog;
> > >> +          }
> > >> +
> > >> +  } else if (btf_id) {
> > >> +          err = -EINVAL;
> > >> +          goto out_put_prog;
> > >> +  } else {
> > >> +          btf_id = prog->aux->attach_btf_id;
> > >> +          tgt_prog = prog->aux->linked_prog;
> > >> +          if (tgt_prog)
> > >> +                  bpf_prog_inc(tgt_prog); /* we call bpf_prog_put() on link release */
> > >
> > > so the first prog_load cmd will beholding the first target prog?
> > > This is complete non starter.
> > > You didn't mention such decision anywhere.
> > > The first ext prog will attach to the first dispatcher xdp prog,
> > > then that ext prog will multi attach to second dispatcher xdp prog and
> > > the first dispatcher prog will live in the kernel forever.
> >
> > Huh, yeah, you're right that's no good. Missing that was a think-o on my
> > part, sorry about that :/
> >
> > > That's not what we discussed back in April.
> >
> > No, you mentioned turning aux->linked_prog into a list. However once I
> > started looking at it I figured it was better to actually have all this
> > (the trampoline and ref) as part of the bpf_link structure, since
> > logically they're related.
> >
> > But as you pointed out, the original reference sticks. So either that
> > needs to be removed, or I need to go back to the 'aux->linked_progs as a
> > list' idea. Any preference?
>
> Good question. Back then I was thinking about converting linked_prog into link
> list, since standalone single linked_prog is quite odd, because attaching ext
> prog to multiple tgt progs should have equivalent properties across all
> attachments.
> Back then bpf_link wasn't quite developed.
> Now I feel moving into bpf_tracing_link is better.
> I guess a link list of bpf_tracing_link-s from 'struct bpf_prog' might work.
> At prog load time we can do bpf_link_init() only (without doing bpf_link_prime)
> and keep this pre-populated bpf_link with target bpf prog and trampoline
> in a link list accessed from 'struct bpf_prog'.
> Then bpf_tracing_prog_attach() without extra tgt_prog_fd/btf_id would complete
> that bpf_tracing_link by calling bpf_link_prime() and bpf_link_settle()
> without allocating new one.
> Something like:
> struct bpf_tracing_link {
>         struct bpf_link link;  /* ext prog pointer is hidding in there */
>         enum bpf_attach_type attach_type;
>         struct bpf_trampoline *tr;
>         struct bpf_prog *tgt_prog; /* old aux->linked_prog */
> };
>
> ext prog -> aux -> link list of above bpf_tracing_link-s
>
> It's a circular reference, obviously.
> Need to think through the complications and locking.
>
> bpf_tracing_prog_attach() with tgt_prog_fd/btf_id will alloc new bpf_tracing_link
> and will add it to a link list.
>
> Just a rough idea. I wonder what Andrii thinks.
>

I need to spend more time reading existing and new code to see all the
details, but I'll throw a slightly different proposal and let you guys
shoot it down.

So, what if instead of having linked_prog (as bpf_prog *, refcnt'ed),
at BPF_PROG_LOAD time we just record the target prog's ID. BPF
verifier, when doing its target prog checks would attempt to get
bpf_prog * reference; if by that time the target program is gone,
fail, of course. If not, everything proceeds as is, at the end of
verification target_prog is put until attach time.

Then at attach time, we either go with pre-recorded (in
prog->aux->linked_prog_id) target prog's ID or we get a new one from
RAW_TP_OPEN tgt_prog_fd. Either way, we bump refcnt on that target
prog and keep it with bpf_tracing_link (so link on detach would put
target_prog, that way it doesn't go away while EXT prog is attached).
Then do all the compatibility checks, and if everything works out,
bpf_tracing_link gets created, we record trampoline there, etc, etc.
Basically, instead of having an EXT prog holding a reference to the
target prog, only attachment (bpf_link) does that, which conceptually
also seems to make more sense to me. For verification we store prog ID
and don't hold target prog at all.


Now, there will be a problem once you attach EXT prog to a new XDP
root program and release a link against the original XDP root program.
First, I hope I understand the desired sequence right, here's an
example:

1. load XDP root prog X
2. load EXT prog with target prog X
3. attach EXT prog to prog X
4. load XDP root prog Y
5. attach EXT prog to prog Y (Y and X should be "compatible")
6. detach prog X (close bpf_link)

Is that the right sequence?

If yes, then the problem with storing ID of prog X in EXT
prog->aux->linked_prog_id is that you won't be able to re-attach to
new prog Z, because there won't be anything to check compatibility
against (prog X will be long time gone).

So we can do two things here:

1. on attach, replace ext_prog->aux->linked_prog_id with the latest
attached prog (prog Y ID from above example)
2. instead of recording target program FD/ID, capture BTF FD and/or
enough BTF information for checking compatibility.

Approach 2) seems like conceptually the right thing to do (record type
info we care about, not an **instance** of BPF program, compatible
with that type info), but technically might be harder.


That's my thoughts without digging too deep, so sorry if I'm making
some stupid assumptions.



[...]

  parent reply	other threads:[~2020-07-20  5:03 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-15 13:08 [PATCH bpf-next v2 0/6] bpf: Support multi-attach for freplace programs Toke Høiland-Jørgensen
2020-07-15 13:09 ` [PATCH bpf-next v2 1/6] bpf: change logging calls from verbose() to bpf_log() and use log pointer Toke Høiland-Jørgensen
2020-07-16 10:10   ` Dan Carpenter
2020-07-16 10:10     ` Dan Carpenter
2020-07-16 10:10     ` Dan Carpenter
2020-07-15 13:09 ` [PATCH bpf-next v2 2/6] bpf: verifier: refactor check_attach_btf_id() Toke Høiland-Jørgensen
2020-07-15 13:09 ` [PATCH bpf-next v2 3/6] bpf: support attaching freplace programs to multiple attach points Toke Høiland-Jørgensen
2020-07-15 20:44   ` Alexei Starovoitov
2020-07-16 10:50     ` Toke Høiland-Jørgensen
2020-07-17  2:05       ` Alexei Starovoitov
2020-07-17 10:52         ` Toke Høiland-Jørgensen
2020-07-20 22:57           ` Alexei Starovoitov
2020-07-20  5:02         ` Andrii Nakryiko [this message]
2020-07-20 23:34           ` Alexei Starovoitov
2020-07-21  3:48             ` Andrii Nakryiko
2020-07-22  0:29               ` Alexei Starovoitov
2020-07-22  6:02                 ` Andrii Nakryiko
2020-07-23  0:32                   ` Alexei Starovoitov
2020-07-23  0:56                     ` Andrii Nakryiko
2020-07-16 10:14   ` Dan Carpenter
2020-07-16 10:14     ` Dan Carpenter
2020-07-16 10:14     ` Dan Carpenter
2020-07-15 13:09 ` [PATCH bpf-next v2 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h Toke Høiland-Jørgensen
2020-07-15 13:09 ` [PATCH bpf-next v2 5/6] libbpf: add support for supplying target to bpf_raw_tracepoint_open() Toke Høiland-Jørgensen
2020-07-15 13:09 ` [PATCH bpf-next v2 6/6] selftests: add test for multiple attachments of freplace program Toke Høiland-Jørgensen
2020-07-15 20:59 [PATCH bpf-next v2 3/6] bpf: support attaching freplace programs to multiple attach points kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEf4BzYd4Xrn4EqzqHCTuJ8TnZiTC1vWWvd=9Np+LNrgbtxOcQ@mail.gmail.com' \
    --to=andrii.nakryiko@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@chromium.org \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=toke@redhat.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.